Rendered at 21:54:26 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jech 2 days ago [-]
That was a long time ago.
Traditionally, character's under Unix were encoded in a locale-specific manner: ISO 8859-1 in Western Europe, ISO 8859-2 in Eastern Europe, EUC-JP in Japan, etc. In the 1990s, there was a major push to get XFree86 (the ancestor of X.Org) to switch to locale-independent UTF-8, lead mainly by Markus Kuhn and Bruno Haible.
The link is to Markus Kuhn's web page, which appears to describe the UTF_8 software available around 1998 or so.
sheept 2 days ago [-]
UTF-8 is not locale independent. You cannot correctly render multilingual UTF-8 text without also specifying its locale, and some transformations like uppercase/lowercase also depend on the locale.
Joker_vD 2 days ago [-]
> You cannot correctly render multilingual UTF-8 text without also specifying its locale
You can render it pretty well, not perfect, but good enough to actually read it, as opposed to not being able to render it at all or rendering mojibake à la Кракозябры instead.
numpad0 2 days ago [-]
At least touching Unicode strings in wrong locales only mildly corrupts the strings. Plenty of Win32 apps would crash if the system locale is in UTF-8.
throw1234567891 1 days ago [-]
UTF-8 is a character encoding and therefore it cannot serve as a locale. There is no UTF-8 language, punctuation, date and number formats…
numpad0 1 days ago [-]
I mean, UTF-8 string handling is language (of the given bitstream, not necessarily the system) dependent, e.g. Turkish lowercase I, Chinese Hanzi vs Japanese Kanji at same codepoints, etc etc...
jech 1 days ago [-]
> UTF-8 is not locale independent.
The encoding itself is locale-independent. Some algorithms (rendering, casing, hyphenation etc.) depend on the locale.
This is unlike the older paradigm, where the encoding itself was dependent on the locale, making things like copy-paste between applications running in different locales problematic.
sourcegrift 2 days ago [-]
Eg: some cjk characters render differently based on whether mainland China, Taiwan, or Japan. One example 骨 (from my old notes so tiny chance this example is incorrect)
cyphar 2 days ago [-]
Yeah, 骨 is one but IMHO the best example is 返 -- it renders differently in every CJK locale.
j16sdiz 2 days ago [-]
> created 1998-09-22 – last modified 2022-12-07
ufocia 2 days ago [-]
A font is not a typeface
adrian_b 1 days ago [-]
True, but following the early documentation from around 1990, from companies like Microsoft, most programmers use the term "font" even when they should say "typeface".
Many of those who know the difference between "font" and "typeface", still use "font" when addressing to programmers or to computer users, for fear that those would not understand other words.
In TFA, the uses of the word "font" are correct, e.g. in "The 6x13, 8x13, 9x15, 9x18, and 10x20 fonts", because it is used to refer to typefaces scaled to a certain fixed size (e.g. "Tahoma" is a typeface, while "12-point Tahoma" is a font).
The word "typeface" is used once in TFA, also correctly, when saying that whether typefaces may be copyrightable depends on the country.
Traditionally, character's under Unix were encoded in a locale-specific manner: ISO 8859-1 in Western Europe, ISO 8859-2 in Eastern Europe, EUC-JP in Japan, etc. In the 1990s, there was a major push to get XFree86 (the ancestor of X.Org) to switch to locale-independent UTF-8, lead mainly by Markus Kuhn and Bruno Haible.
The link is to Markus Kuhn's web page, which appears to describe the UTF_8 software available around 1998 or so.
You can render it pretty well, not perfect, but good enough to actually read it, as opposed to not being able to render it at all or rendering mojibake à la Кракозябры instead.
The encoding itself is locale-independent. Some algorithms (rendering, casing, hyphenation etc.) depend on the locale.
This is unlike the older paradigm, where the encoding itself was dependent on the locale, making things like copy-paste between applications running in different locales problematic.
Many of those who know the difference between "font" and "typeface", still use "font" when addressing to programmers or to computer users, for fear that those would not understand other words.
In TFA, the uses of the word "font" are correct, e.g. in "The 6x13, 8x13, 9x15, 9x18, and 10x20 fonts", because it is used to refer to typefaces scaled to a certain fixed size (e.g. "Tahoma" is a typeface, while "12-point Tahoma" is a font).
The word "typeface" is used once in TFA, also correctly, when saying that whether typefaces may be copyrightable depends on the country.