Languages and writing systems

This topic describes the basic elements of writing systems, the differences between a language and a writing system, why and how writing system impacts software development. Writing system is a set of rules for using one or more scripts to write a particular language. Examples of different writing systems include: Chinese (logographic/ideographic), Japanese (syllabic), English (Latin), Russian (Cyrillic), Greek (Greek), Arabic (Bi-directional), Thai (Thai).

A natural language refers to the spoken and written forms of languages. The Symbian platform language portfolio supports over 40 base languages included at the time of the writing.

It is common to confuse language and writing system. Writing system is something that is used to represent a language (or languages) in written form. There are a number of languages that can be written with more than one writing system, for instance, Japanese language is written using four different scripts that form the Japanese writing system. The four scripts are hiragana, katakana, kanji and romaji. Each script answers different needs when writing Japanese. They are used for different purposes to present the Japanese language. On the other hand, for example Serbian can be written using either Latin or Cyrillic script, but they are not used simultaneously – the writer decides which one to use. In some cases, the choice of a writing system can be a political statement.

In some cases internationalization is simple, for example, making a US application accessible to Australian or British users may require little more than a few spelling corrections. But to make a US application usable by Japanese users, or a Korean application usable by German users, will require that the software operate not only in different languages, but use different input techniques, character encodings and presentation conventions.

Qt tries to make internationalization as painless as possible for developers. All input widgets and text drawing methods in Qt offer built-in support for all supported languages. The built-in font engine is capable of correctly and attractively rendering text that contains characters from a variety of different writing systems at the same time.

Qt supports most languages in use today, in particular:

  • All East Asian languages (Chinese, Japanese and Korean)

  • All Western languages (using Latin script)

  • Arabic

  • Cyrillic languages (Russian, Ukrainian, etc.)

  • Greek

  • Hebrew

  • Thai and Lao

  • All scripts in Unicode 5.1 that do not require special processing

On Windows, Unix/X11 with FontConfig (client side font support) and Qt for Embedded Linux the following languages are also supported:

  • Bengali

  • Devanagari

  • Dhivehi (Thaana)

  • Gujarati

  • Gurmukhi

  • Kannada

  • Khmer

  • Malayalam

  • Myanmar

  • Syriac

  • Tamil

  • Telugu

  • Tibetan

  • N'Ko

Many of these writing systems exhibit special features:

  • Special line breaking behavior: Some of the Asian languages are written without spaces between words. Line breaking can occur either after every character (with exceptions) as in Chinese, Japanese and Korean, or after logical word boundaries as in Thai.

  • Bidirectional writing: Arabic and Hebrew are written from right to left, except for numbers and embedded English text which is written left to right. The exact behavior is defined in Unicode Bidirectional Algorithm.

  • Non-spacing or diacritical marks (accents or umlauts in European languages): Some languages such as Vietnamese make extensive use of these marks and some characters can have more than one mark at the same time to clarify pronunciation.

  • Ligatures: In special contexts, some pairs of characters get replaced by a combined glyph forming a ligature. Common examples are the fl and fi ligatures used in typesetting US and European books.

Qt tries to take care of all the special features listed above. You usually don't have to worry about these features so long as you use Qt's input widgets (for example, QLineEdit, QTextEdit and derived classes) and Qt's display widgets (for example, QLabel).

Support for these writing systems is transparent to the programmer and completely encapsulated in Qt's text engine. This means that you don't need to have any knowledge about the writing system used in a particular language, except for the following small points:

  • QPainter::drawText(int x, int y, const QString &str) will always draw the string with its left edge at the position specified with the x, y parameters. This will usually give you left aligned strings. Arabic and Hebrew application strings are usually right aligned, so for these languages use the version of drawText() that takes a QRect since this will align in accordance with the language.

  • When you write your own text input controls, use QTextLayout. In some languages (for example, Arabic or languages from the Indian subcontinent), the width and shape of a glyph changes depending on the surrounding characters, which QTextLayout takes into account. Writing input controls usually requires a certain knowledge of the scripts it is going to be used in. Usually the easiest way is to subclass QLineEdit or QTextEdit.

Below is a table listing languages and related writing systems. Some languages may use more than one writing system.

In the table below there are also languages listed that are not included in the Symbian platform language portfolio, and they are indicated by the italic typeface. These languages are included here for reference only as to indicate which languages would basically require only localization and variant testing, and no major changes into the generic code.

Please note that the following table does not contain a full list of writing systems in the world, but only writing systems typically implemented in the IT industry are covered.

Table: Writing systems and languages

Writing system

Languages using this writing system

Arabic

Arabic, Urdu, Farsi (= Persian), Kashmiri, Pashto, Sindhi

Chinese

Simplified Chinese, Traditional Chinese (Hong Kong), Traditional Chinese (Taiwan)

Cyrillic

Russian, Bulgarian, Ukrainian, Macedonian, Belarusian, Serbian (Serbian in Symbian platform is done with Latin writing system, which is another option for writing Serbian)

Devanagari

Hindi, Marathi, Sindhi, Nepalese

Dravidian

Tamil, Malayalam, Telugu

Greek

Greek

Hebrew

Hebrew, Ladino, Yiddish

Japanese

Japanese

Hangul

Korean

Latin

Most European and African languages, and some of APAC languages (Vietnamese: Latin-Extended)

Thai

Thai

Note: Dialects are not listed in the table, for example, in Chinese, there are Chinese Mandarin, Chinese Cantonese, and so on.