Representation of language

The ISO 639 language codes have been adopted by many standardization initiatives as the preferred way of expressing the language of text or audio. However, since ISO 639 offers two separate encodings (2-letter and 3-letter) and does not cover regional variants of languages, some extended recommendations based on this standard have been issued.

RFC 4646

This is a Best Current Practice document from the IETF (Internet Engineering Task Force). RFC 4343 has been issued in September, 2006, and supersedes the widely implemented RFC 3066 from January, 2001. Entitled Tags for Identifying Languages, this document specifies rules for selecting codes from ISO 639 and for adding subtags for script (based on ISO 15924), for region (based on ISO 3166-1) and for other variants such as period or dialect (based on the IANA registry).

For the purpose of the Cinematograph Works Standard, it can be useful to express if e.g. a film soundtrack is in British or American English, or if Serbo-Croation subtitles are in latin or cyrillic script. Therefore, the CWS group is likely to adopt RFC 4646 as the preferred representation for content languages. Since RFC 4646 is essentially a superset of ISO 639, this decision would not require important changes to data sets that have been using the ISO 639 language lists in the past.

languages.txt · Last modified: 2008/04/21 10:02 (external edit)
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki