Common pitfalls when writing for BiDi (or i18n in general)

This is article 2 in a series of articles about best practices programming for BiDi support. The full series can be viewed here.

As written in the first part of the series, you are not expected to implement any of the above procedures yourself. In almost any environment you are likely to encounter, someone already implemented BiDi, including all of the above. Typically, merely outputting a string is enough to cause the rendering engine to perform all of the above operations, and display it properly. The main problem is that some programmers elect to bypass the mechanisms provided by the operating environment, and roll their own optimizations. More often then not, these optimizations do not take into consideration some of the BiDi related special cases. Follows are a few things to keep in mind when writing for BiDi support.

Small changes in logical text do not, necessarily, spell small changes in the visual output

This is, probably, the most non-obvious pitfall. In an attempt to speed changing text redraw, a program will assume that performing a small change in the logical string means that only the part of the string from the point of the change to the right needs to be redrawn. This assumption is incorrect with BiDi strings, as the text might be in a RTL run, and the changes actually propogate to the left. Worse, a change might actually mean the entire string needs to be repainted. For example, consider the following logical string inside a RTL paragraph:

a typical day at work

This sentence is rendered as is. Now suppose we add a single RTL character in the middle. Since the paragraph is a RTL one, the logical string is

a typical A day at work

and the visual rendering should be:

day at work A a typical

As you can see, the entire string has to be repainted, as no character remained where it was. The only generally safe rule is to assume that any change to any part of the paragraph might result in repainting the entire paragraph.

Output the entire paragraph as one API call

Some programs, for whatever reason, output the characters of a string one at a time, or any other way that does not include outputing the entire paragraph in one go. The result is that the underlying API system does not have the entire string to reorder, and the final ordering turns out wrong. This may produce strings that are actually unreadable to the end user. Whenever possible, treat the entire paragraph as one string, and print it in one go.

Of course, sometimes this is not possible. The underlying API may not provide the formatting you require. For example, the API might not provide a single API to output multi-line text, or to output text that changes font/color/weight in its middle. Sometimes there is no option but to make several separate API calls to produce the same string.

There is no easy solution for making distinct printing calls for outputting a single paragraph. The general advice is to follow the algorithm detailed in the first article in the series for multi-line string printing. Most operating systems do provide APIs for performing the stages described there independently, so a BiDi aware application can introduce its own processing anywhere along the process it needs to. In a later article, we will discuss code samples for performing these steps on several common platforms.

Argument order may change

Different languages have different order in which syntactical components of a sentence are ordered. When your translators need to translate a sentence, your infrastructure must support some way to allow them to provide the arguments in a different order than the one you need for English (or whatever your native language is). One way to do that is to perform the formatting using a printf like function. Printf supports naming variables by ordinals for out of order printing of the arguments. For example, the following program prints "b c a":

#include <stdio.h>

int main() {
printf("%2$c %3$c %1$c\n", 'a', 'b', 'c');
return 0;
}

As for the one string case, if your program does not use printf to format the arguments, some other mechanism must be put in place to allow the argument order after localization to be a different one than the one before.

Days of week

In most locales the first day of the week is Monday, but in some locales it is Sunday or Saturday. Also, while in most locales the weekend is Saturday and Sunday, in some locales the weekend is actually Firday and Saturday, or even only Friday. If your code displays a calender, has a "every work day" option, or otherwise relies on this information, you should have some facility to change this setting based on locale or user preference.

Direction of fields

When working with BiDi languages, it is important to set the paragraph direction. Very broadly speaking, when rendering BiDi text, the text is divided into runs, going in the the same direction. The paragraph direction can be thought of as a way to set what direction the runs go. Getting the paragraph direction wrong will cause the entire sentence to be almost unreadable. The Unicode BiDi algorithm, when discussing the pargraph direction, states that it should be set according to the first strong directional character in the paragraph (rules P2 and P3 of the algorithm). As a result, many think that overriding/hard coding the paragraph direction is a violation of the algorithm. This notion is completely incorrect. The comment immediately after P3 explicitly states:

Whenever a higher-level protocol specifies the paragraph level, rules P2 and P3 do not apply.

Using rules P2 and P3 to set the paragraph direction should be the exception, not the rule! It is rarely the right thing to allow the first directional character to dictate the direction of the entire paragraph. Ideally, the translators should be able to specify for each field whether it is a left to right or right to left field.

It should be noted that the above relates not only to text fields, but also (and, perhaps, even more so) to input fields.

Program interface language

This point is not BiDi specific, but violating it is more commonly noticed in BiDi related contexts. Most operating systems have standard ways to set what language a program should be talking to the user in. Especially for BiDi languages, it is quite common for native speakers of one language to still prefer the interface to be in another (most commonly - English). The full explanation of how to honor the user's requests will be covered in a future article, but for now the following should point you in the right direction:

Under Windows, you can simply use FindResource or one of its wrappers (LoadString, LoadMenu etc.). If you want to support non-default languages, use FindResourceEx, but make the default the result from GetUserDefaultUiLanguage.

Under Unix, use the result of the first of the list of environment variables that is set of: LC_ALL, LC_MESSAGES or LANG. It is extremely uncommon to allow program specific configuration to override that, as the user is expected to simply change the environment variable if they want anything else. As such, a configuration option for overriding the environment is recommended against on Unix based platforms, unless that setting affects more than just your program by setting the above mentioned environment variables.

Automatic interface mirroring

Some people assume that, since the usual interface is left to right, merely mirroring the entire interface location will provide an interface suitable for right to left languages. The results are, at best, awkward. Translating the interface requires a full translation work. The translation mechanism should allow the translators to redesign the interface as they think suits the language.

Hopefully, being aware of these pitfalls will help programmers avoid them. The next articles in the series will attempt to elaborate more on various aspects of writing i18n, and particularly BiDi, aware programs.