method encoding

Table of Contents
1	class IO::CatHandle
1.1	(IO::CatHandle) method encoding
2	class IO::Handle
2.1	(IO::Handle) method encoding
2.1.1	utf16, utf16le and utf16be
2.1.2	Examples

Documentation for method encoding assembled from the following types:

class IO::CatHandle

From IO::CatHandle

(IO::CatHandle) method encoding

Defined as:

multi method encoding(IO::CatHandle:D:)
multi method encoding(IO::CatHandle:D: $new-encoding)

Sets the invocant's $.encoding attribute to the provided value. Valid values are the same as those accepted by IO::Handle.encoding (use value Nil to switch to binary mode). All source handles, including the active one will use the provided $.encoding value.

(my $f1 = 'foo'.IO).spurt: 'I ♥ Perl';
(my $f2 = 'bar'.IO).spurt: 'meow';
with IO::CatHandle.new: $f1, $f2 {
    # .encoding is 'utf8' by default: 
    .readchars(5).say; # OUTPUT: «I ♥ P␤» 
 
    .encoding: Nil; # switch to binary mode 
    .slurp.say; # OUTPUT: «Buf[uint8]:0x<72 6c 6d 65 6f 77>␤» 
}

class IO::Handle

From IO::Handle

(IO::Handle) method encoding

Defined as:

multi method encoding(IO::Handle:D: --> Str:D)
multi method encoding(IO::Handle:D: $enc --> Str:D)

Returns a Str representing the encoding currently used by the handle, defaulting to "utf8". Nil indicates the filehandle is currently in binary mode. Specifying an optional positional $enc argument switches the encoding used by the handle; specify Nil as encoding to put the handle into binary mode.

The accepted values for encoding are case-insensitive. The available encodings vary by implementation and backend. On Rakudo MoarVM the following are supported:

utf8
utf16
utf16le
utf16be
utf8-c8
iso-8859-1
windows-1251
windows-1252
windows-932
ascii

The default encoding is utf8, which undergoes normalization into Unicode NFC (normalization form canonical). In some cases you may want to ensure no normalization is done; for this you can use utf8-c8. Before using utf8-c8 please read Unicode: Filehandles and I/O for more information on utf8-c8 and NFC.

As of Rakudo 2018.04 windows-932 is also supported which is a variant of ShiftJIS.

Implementation may choose to also provide support for aliases, e.g. Rakudo allows aliases latin-1 for iso-8859-1 encoding and dashed utf versions: utf-8 and utf-16.

utf16, utf16le and utf16be

Unlike utf8, utf16 has an endianness — either big endian or little endian. This relates to the ordering of bytes. Computer CPUs also have an endianness. Raku's utf16 format specifier will use the endianness of host system when encoding. When decoding it will look for a byte order mark and if it is there use that to set the endianness. If there is no byte order mark it will assume the file uses the same endianness as the host system. A byte order mark is the codepoint U+FEFF which is ZERO WIDTH NO-BREAK SPACE. On utf16 encoded files the standard states if it exists at the start of a file it shall be interpreted as a byte order mark, not a U+FEFF codepoint.

While writing will cause a different file to be written on different endian systems, at the release of 2018.10 the byte order mark will be written out when writing a file and files created with the utf16 encoding will be able to be read on either big or little endian systems.

When using utf16be or utf16le encodings a byte order mark is not used. The endianness used is not affected by the host cpu type and is either big endian for utf16be or little endian for utf16le.

In keeping with the standard, a 0xFEFF byte at the start of a file is interpreted as a ZERO WIDTH NO-BREAK SPACE and not as a byte order mark. No byte order mark is written to files that use the utf16be or utf16le encodings.

As of Rakudo 2018.09 on MoarVM, utf16, utf16le and utf16be are supported. In 2018.10, writing to a file with utf16 will properly add a byte order mark (BOM).

Examples

with 'foo'.IO {
    .spurt: "First line is text, then:\nBinary";
    my $fh will leave {.close} = .open;
    $fh.get.say;         # OUTPUT: «First line is text, then:␤» 
    $fh.encoding: Nil;
    $fh.slurp.say;       # OUTPUT: «Buf[uint8]:0x<42 69 6e 61 72 79>␤» 
}