Writing non-English languages with a QWERTY keyboard

QWERTY and nothing else

My first experience on a keyboard (in 1981) was on a Teletype Model 33. This was to learn how to program, not how to type blindly, which I still haven't mastered... But I did develop a preference for mechanical keyboards. When IBM introduced the IBM PC AT (Advanced Technology), they also introduced the IBM Model M keyboard. I have been using these keyboards ever since. My Model M's have 101 keys in QWERTY layout and no key with a logo on it (some people call this the Windows key).

Picture of QWERTY layout

Microsoft International keyboard (with dead keys)

My native language is Dutch, but I read and write a few others. Most of these languages have 'foreign' (accented) characters, which obviously are not available on the keyboard. Starting with Windows 3.0 (in 1990), there was a solution: The US International layout. It uses dead keys. A dead key no longer generates a character, it's a prefix for the next character. An apostrophe (') followed by the letter e will generate é (e-acute). Great stuff! But what if one needs to type an apostrophe? Just type ', followed by a space. Easy but very annoying for programmers (like me), who have to type quotes and double quotes all the time.

Picture of International with dead keys layout

The US International layout does have another feature: Most keys will generate special characters when pressed while the right Alt key is held down. (The right Alt key is marked AltGr on most localized keyboards.) No sequence of ' and e needed, but just AltGr-e. The number of accented characters (in most languages) is rather small. The letter é is common in French, but its frequency is only 1,5%. I started wondering what would happen if I changed the Microsoft design so that the dead keys were no longer dead (allowing me to program freely) but still would have é at my fingertips through AltGr-e. Removing the dead keys posed a problem: ë (which I use in Dutch) is not available through an AltGr combination but only through a dead key.

International keyboard (with AltGr dead keys)

I experimented with how awkward it would be if I used AltGr to get to the dead keys as well: for ï, I would have to type Shift-AltGr-", followed by i. Awkward indeed, but ï is not very common (in Dutch). I got rid of the very annoying " followed by a space to get a single doublequote AND still had access to all characters! I used this layout for a number of years, modifying the .../X11/symbols/us file after every new installation of Linux.

Picture of International with altgr dead keys layout

Some people around me noticed my layout, which made me think of submitting it to ... (I didn't know). With some help, my proposal was accepted (in 2007). Within one or two releases, I was able to stop modifying files, but just select "International AltGr dead keys" from a menu. Great.

Some people started using the layout, some of them helping others to find it in Linux. Some got accustomed to the layout and wanted it to work on Windows too.

Western European keyboard (with AltGr dead keys)

In 2015, years after altgr-intl became commonly available, I received an e-mail from Enno, who writes German a lot more than I do. His issue was with the (common German) letters ä, ë, ö and ü. They're available, but all over the keyboard. A somewhat logical layout would be nicer, but that would require breaking the Microsoft 'standard'.

Which made me think. What makes the Microsoft International keyboard eh... international? What rationale is behind having support for the letter ð (eth) - a letter used in Old English, Middle English, Icelandic, Faroese - languages with few speakers. The letter ã (a-tilde) is necessary for Portuguese, but that is only accessible through a (shifted(!) AltGr-) dead key. Or, as Enno put it: © (AltGr-c) is only common in Redmond.

Please note: This is a work in progress; the layout(s) shown below may change without notice. We would like to have an alternative version of altgr-intl integrated into the xkeyboard-config source tree. A first attempt was abandoned (late 2020) and a new version (2.0) was requested to be included on April 13, 2021. Please do not use this layout for project(s) before it is finalized.

We tried to come up with a layout that would support more languages a little better. Questions: which characters do we really need? And how to distribute them in a way that doesn't make typing too uncomfortable and is easy to remember?

Consultant Stefan, who makes letter frequency tables for many languages, prepared for us a combined frequency table for 10 Western European languages. This gives us a more or less reliable list of the characters we need, even for languages we do not speak or write. We made all of these accented characters in this table available as a single AltGr- keystroke. Diaeresis is most frequent (1.084%), followed by acute (0.596%): these would get the best places. Grave, tilde and circumflex cannot strictly follow logic (but are less frequent). Œ ended up on the '.' key, but æ is on 'w' (æ is used only in Danish and Norwegian but still reaches 0.123% frequency of 10 languages combined).

This layout supports 11 languages: English (of course), Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish and Swedish in the sense that all accented characters in these languages are available as one AltGr keystroke. Note that this includes the 4 (major) languages of the American continent (combined with Western Europe about a billion speakers).

The layout doesn't 'break' a lot if you're used to altgr-intl: as long as you leave the AltGr key alone, it's still a standard United States QWERTY keyboard - great for 'domestic' typing and for coding. For an occasional accented letter, use the 'classic' AltGr dead keys ( ' ^ " ` ~ ) just like you're used to in altgr-intl.

These are new, but easier to remember 'ground rules' for fast access to accented letters using AltGr:

Download (and print?) a cheat sheet and find out how to install on Linux, Windows or macOS.

'Classic' dead keys acute, circumflex, diaeresis, grave, tilde as well as doubleacute and macron are in the same positions as on altgr-intl. Dead keys abovedot, abovering, caron and ogonek have moved. Cedilla, greek and stroke are new. Belowdot, breve, hook and horn are no longer available (see 'Compose keys', below, for a solution on Linux).

Picture of Western European with altgr dead keys layout (screenshot of 'gkbd-keyboard-display -l us?altgr-weur')

The layout is called altgr-weur. For languages of Eastern Europe, someone (who understands those languages) could make an altgr-eeur.

Compose keys (on Linux)

The altgr-intl layout has (AltGr-) dead keys for European languages, but also (AltGr-) dead keys for transcriptions and phonetic notations. This layout eliminates dead keys belowdot, breve, horn and hook, but that doesn't mean one cannot correctly type (let's say) a Vietnamese name. The compose key can be enabled to have virtually all UTF-16 characters available. Start

gnome-tweaks

go to Keyboard and Mouse, and select a Compose Key. (This key is also called the Multi_Key. I chose to use the RightCtrl key.) Composing a character works as follows: press the Multi_Key (RightCtrl) key (and release), then press apostrophe ('), then press e to get é. Almost any character can be generated this way, for example ⅜ (RightCtrl, '3', '8'), which is probably not (natively) available on any keyboard. AltGr-'a' (for ä) is faster - that's why altgr-weur offers all accented characters (for mentioned 11 languages) through one AltGr combination.

Present on the (altgr-weur) layout:
X11 name Description Dead Usage
abovedot dot, above . Polish, Turkish, Lithuanian and transcriptions
abovering ring above o Scandinavian Å and with 4 other letters in Slavic languages
acute right pointing apostrophe above Modern Latin and Cyrillic languages
caron inverted circumflex above < Finnish (only for transcriptions), Italian (for Slavic names)
cedilla comma below , French, Portuguese (& Catalan)
circumflex upward chevron above ^ Modern Latin languages
diaeresis two dots above Modern Latin languages and Greek
doubleacute double acute = Hungarian
grave left pointing apostrophe above ` Western European languages
greek     Greek alphabet: α, β, δ etc
macron bar above _ Balkan, Baltic, Polynesian, transcriptions
ogonek inverted cedilla ; Poland, Lithuania, Native American
stroke (various) strokes through / Scandinavian Ø and special purposes
tilde approximation sign above ~ Spanish, Portuguese (& Vietnamese)
Not available on this layout (removed from altgr-intl):
X11 name Description Dead Usage
belowdot dot below ! Transcriptions and phonetic notations
breve rounded caron above U Languages around the Black Sea
hook questionmark above ? Vietnamese
horn comma right above + Vietnamese (usually in combination with other accented vowels)

The character in the Dead column is the key that 'dies' after you press (and release) the Multi_Key: it will produce no output. Next, you follow with the letter that needs to have that accent. For instance: Multi_Key, '_', 'e' gives ē (Hungarian e-macron). Similarly, Multi_Key, '+', 'o' gives (Vietnamese) ơ. For a list with many examples, see Ubuntu help or the X11 server documentation.

Unicode input

As a last resort, you could input the hexadecimal code for the character you need. The altgr-weur layout has currency symbols for Dollars and cents ($, ¢), British Pound (£), Japanese Yen (¥) and the Euro (€). The Israelian Shekel has unicode U+20AA. Type Shift-Ctrl-u, followed by 2, 0, a, a and Enter (₪). Copyright is U+00A9: © (example provided to keep Redmond happy).

Back