Entering unicode characters
Input methods for unicode characters in terminals, the shell, and editors
1 | XCompose (Linux) |
1.1 | Getting compose working in all programs |
1.2 | ibus |
1.2.1 | KDE |
2 | WinCompose (Windows) |
3 | Terminals, shells, and editors: |
3.1 | XTerm |
3.2 | URxvt |
3.3 | Unix shell |
3.4 | Screen |
3.5 | Vim |
3.5.1 | vim-perl6 |
3.6 | Emacs |
4 | Some characters useful in Raku |
4.1 | Smart quotes |
4.2 | Guillemets |
4.3 | Set/bag operators |
4.4 | Mathematical symbols |
4.5 | Greek characters |
4.6 | Superscripts and subscripts |
Raku allows the use of unicode characters as variable names. Many operators are defined with unicode symbols (in particular the set/bag operators) as well as some quoting constructs. Hence it is good to know how to enter these symbols into editors, the Raku shell and the command line, especially if the symbols aren't available as actual characters on a keyboard.
General information about entering unicode under various operating systems and environments can be found on the Wikipedia unicode input page.
XCompose (Linux)
Xorg includes digraph support using a Compose key . The default of AltGr + Shift
can be remapped to something easier such as Capslock
. In GNOME 2 and MATE this can be setup under Preferences → Keyboard → Layouts → Options → Position of Compose Key
. So, for example, to input »+«
you could type CAPSLOCK > > + CAPSLOCK < <
XCompose allows customizing the digraph sequences using a .XCompose
file and https://github.com/kragen/xcompose/blob/master/dotXCompose is an extremely complete one. In GNOME, XCompose was overridden and replaced with a hardcoded list, but it is possible to restore XCompose by setting GTK_IM_MODULE=xim
in your environment. It might be necessary to install a xim bridge as well, such as uim-xim
.
Getting compose working in all programs
You may have issues using the compose key in all programs. In that case you can try ibus
.
input_module=ximexport GTK_IM_MODULE=export XMODIFIERS==export QT_IM_MODULE=
If you want this to be for all users you can put this in a file /etc/profile.d/compose.sh
, which is the easiest way, since you won't have to deal with how different GUI environments set up their environment variables.
If you use KDE you can put this file in ~/.config/plasma-workspace/env/compose.sh
and that should work. Other desktop environments will be different. Look up how to set environment variables in yours or use the system-wide option above.
ibus
If you have problems entering high codepoint symbols such as 🐧 using the xim
input module, you can instead use ibus. You will have to install the ibus package for your distribution. Then you will have to set it to start on load of your Desktop environment. The command that needs to be run is:
ibus-daemon --xim --verbose --daemonize --replace
Setting --xim
should also allow programs not using ibus to still use the xim input method and be backward compatible.
KDE
If you are using KDE, open the start menu and type in “Autostart” and click Autostart which should be the first result. In the settings window that opens, click Add program, type in ibus-daemon
and click OK. Then go into the Application tab of the window that pops up. In the Command
field, enter in the full ibus-daemon command as shown above, with the --desktop
option set to --desktop=plasma
. Click OK. It should now launch automatically when you log in again.
WinCompose (Windows)
WinCompose adds compose key functionality to Windows. It can be installed either via the WinCompose releases page on GitHub, or with the Chocolatey package manager.
Once the program is installed and running, right click the tray icon and select Options → Composing → Behavior → Compose Key
to set your desired key.
WinCompose has multiple sources to choose from in Options → Composing → Sequences
. It is recommended to enable XCompose
and disable Xorg
, as there are a handful of operators which Xorg
does not provide sequences for, and Xorg
also has sequences which conflict with operator sequences present in XCompose
. Sequences can be viewed by right clicking the tray icon and selecting Show Sequences
. If you wish to add your own sequences, you can do so by either adding/modifying .XCompose
in %USERPROFILE%
, or editing user-defined sequences in the options menu.
Terminals, shells, and editors:
XTerm
Unicode support is enabled in XTerm primarily by setting its utf8
and utf8Fonts
options to 1
, along with its locale
option to UTF-8
, in ~/.Xdefaults
. Here is a sample configuration that supports displaying enough of unicode to program in Raku:
XTerm*faceName: xft:Noto Mono:style=RegularXTerm*faceNameDoublesize: xft:Noto Emoji:style=RegularXTerm*faceSize: 10XTerm*locale: UTF-8XTerm*titleModes: 16XTerm*utf8: 1XTerm*utf8Fonts: 1XTerm*utf8Title: true
URxvt
Similarly to XTerm, unicode support is enabled in URxvt primarily by setting its locale
option to en_US.UTF-8
in ~/.Xdefaults
. Here is a sample configuration that supports displaying enough of unicode to program in Raku:
URxvt*font: xft:Noto Mono:pixelsize=14:style=Regular,\xft:Noto Emoji:pixelsize=14:style=RegularURxvt*letterSpace: -1URxvt*locale: en_US.UTF-8URxvt*skipBuiltInGlyphs: true
Unix shell
At the bash shell, one enters unicode characters by using entering Ctrl-Shift-u
, then the unicode code point value followed by enter. For instance, to enter the character for the element-of operator (∈) use the following key combination (whitespace has been added for clarity):
Ctrl-Shift-u 2208 Enter
This also the method one would use to enter unicode characters into the perl6
REPL, if one has started the REPL inside a Unix shell.
Screen
GNU Screen does sport a digraph command but with a rather limited digraph table. Thanks to bindkey and exec an external program can be used to insert characters to the current screen window.
bindkey ^K exec .! digraphs
This will bind control-k to the shell command digraphs. You can use digraphs if you prefer a Raku friendly digraph table over RFC 1345 or change it to your needs.
Vim
In Vim, unicode characters are entered (in insert-mode) by pressing first Ctrl-V
(also denoted ^V
), then u
and then the hexadecimal value of the unicode character to be entered. For example, the Greek letter λ (lambda) is entered via the key combination:
^Vu03BB
You can also use Ctrl-K
/^K
along with a digraph to type in some characters. So an alternative to the above using digraphs looks like this:
^Kl*
The list of digraphs Vim provides is documented here; you can add your own with the :digraph
command.
Further information about entering special characters in Vim can be found on the Vim Wikia page about entering special characters.
vim-perl6
The vim-perl6 plugin for Vim can be configured to optionally replace ASCII based ops with their Unicode based equivalents. This will convert the ASCII based ops on the fly while typing them.
Emacs
In Emacs, unicode characters are entered by first entering the chord C-x 8 RET
at which point the text Unicode (name or hex):
appears in the minibuffer. One then enters the unicode code point hexadecimal number followed by the enter key. The unicode character will now appear in the document. Thus, to enter the Greek letter λ (lambda), one uses the following key combination:
C-x 8 RET 3bb RET
Further information about unicode and its entry into Emacs can be found on the Unicode Encoding Emacs wiki page.
You can also use RFC 1345 character mnemonics by typing:
C-x RET C-\ rfc1345 RET
Or C-u C-\ rfc1345 RET
.
To type special characters, type &
followed by a mnemonic. Emacs will show the possible characters in the echo area. For example, Greek letter λ (lambda) can be entered by typing:
*
You can use C-\
to toggle input method.
Another input method you can use to insert special characters is TeX. Select it by typing C-u C-\ TeX RET
. You can enter a special character by using a prefix such as \
. For example, to enter λ, type:
\lambda
To view characters and sequences provided by an input method, run the describe-input-method
command:
C-h I TeX
Some characters useful in Raku
Smart quotes
These characters are used in different languages as quotation marks. In Raku they are used as quoting characters
Constructs such as these are now possible:
say 「What?!」;say ”Whoa!“;say „This works too!”;say „There are just too many ways“;say “here: “no problem” at all!”; # You can nest them!
This is very useful in shell:
perl6 -e 'say ‘hello world’'
since you can just copy and paste some piece of code and not worry about quotes.
Guillemets
These characters are used in French and German as quotation marks. In Raku they are used as interpolation word quotes, hyper operators and as an angle bracket alternative in POD6.
symbol | unicode code point | ascii equivalent |
---|---|---|
« | U+00AB | << |
» | U+00BB | >> |
Thus constructs such as these are now possible:
say (1, 2) »+« (3, 4); # OUTPUT: «(4 6)» - element-wise add[1, 2, 3] »+=» 42; # add 42 to each element of @arraysay «moo»; # OUTPUT: «moo»my = "foo bar";say « ber».perl; # OUTPUT: «("foo", "bar", "foo", "bar", "ber")»
Set/bag operators
The set/bag operators all have set-theory-related symbols, the unicode code points and their ascii equivalents are listed below. To compose such a character, it is merely necessary to enter the character composition chord (e.g. Ctrl-V u
in Vim; Ctrl-Shift-u
in Bash) then the unicode code point hexadecimal number.
operator | unicode code point | ascii equivalent |
---|---|---|
∈ | U+2208 | (elem) |
∉ | U+2209 | !(elem) |
∋ | U+220B | (cont) |
∌ | U+220C | !(cont) |
⊆ | U+2286 | (<=) |
⊈ | U+2288 | !(<=) |
⊂ | U+2282 | (<) |
⊄ | U+2284 | !(<) |
⊇ | U+2287 | (>=) |
⊉ | U+2289 | !(>=) |
⊃ | U+2283 | (>) |
⊅ | U+2285 | !(>) |
∪ | U+222A | (|) |
∩ | U+2229 | (&) |
∖ | U+2216 | (-) |
⊖ | U+2296 | (^) |
⊍ | U+228D | (.) |
⊎ | U+228E | (+) |
Mathematical symbols
Wikipedia contains a full list of mathematical operators and symbols in unicode as well as links to their mathematical meaning.
Greek characters
Greek characters may be used as variable names. For a list of Greek and Coptic characters and their unicode code points see the Greek in Unicode Wikipedia article.
For example, to assign the value 3 to π, enter the following in Vim (whitespace added to the compose sequences for clarity):
my $Ctrl-V u 03C0 = 3; # same as: my $π = 3; say $Ctrl-V u 03C0; # 3 same as: say $π;
Superscripts and subscripts
A limited set of superscripts and subscripts can be created directly in unicode by using the U+207x
, U+208x
and (less often) the U+209x
ranges. However, to produce a value squared (to the power of 2) or cubed (to the power of 3), one needs to use U+00B2
and U+00B3
since these are defined in the Latin1 supplement Unicode block.
Thus, to write the Taylor series expansion around zero of the function exp(x)
one would input into e.g. vim the following:
exp(x) = 1 + x + xCtrl-V u 00B2/2! + xCtrl-V u 00B3/3! + ... + xCtrl-V u 207F/n! # which would appear as exp(x) = 1 + x + x²/2! + x³/3! + ... + xⁿ/n!
Or to specify the elements in a list from 1
up to k
:
ACtrl-V u 2081, ACtrl-V u 2082, ..., ACtrl-V u 2096 # which would appear as A₁, A₂, ..., Aₖ