Data structures

How Raku deals with data structures and what we can expect from them

Scalar structures

Some classes do not have any internal structure and to access parts of them, specific methods have to be used. Numbers, strings, and some other monolithic classes are included in that class. They use the $ sigil, although complex data structures can also use it.

my $just-a-number = 7;
my $just-a-string = "8";

There is a Scalar class, which is used internally to assign a default value to variables declared with the $ sigil.

my $just-a-number = 333;
say $just-a-number.VAR.^name# OUTPUT: «Scalar␤»

Any complex data structure can be scalarized by using the item contextualizer $:

(123$(45))[3].VAR.^name.say# OUTPUT: «Scalar␤»

However, this means that it will be treated as such in the context they are. You can still access its internal structure.

(123$(45))[3][0].say# OUTPUT: «4␤»

An interesting side effect, or maybe intended feature, is that scalarization conserves identity of complex structures.

for ^2 {
     my @list = (11);
     say @list.WHICH;
} # OUTPUT: «Array|93947995146096␤Array|93947995700032␤»

Every time (1, 1) is assigned, the variable created is going to be different in the sense that === will say it is; as it is shown, different values of the internal pointer representation are printed. However

for ^2 {
  my $list = (11);
  say $list.WHICH
} # OUTPUT: «List|94674814008432␤List|94674814008432␤»

In this case, $list is using the Scalar sigil and thus will be a Scalar. Any scalar with the same value will be exactly the same, as shown when printing the pointers.

Complex data structures

Complex data structures fall in two different broad categories: Positional, or list-like and Associative, or key-value pair like, according to how you access its first-level elements. In general, complex data structures, including objects, will be a combination of both, with object properties assimilated to key-value pairs. While all objects subclass Mu, in general complex objects are instances of subclasses of Any. While it is theoretically possible to mix in Positional or Associative without doing so, most methods applicable to complex data structures are implemented in Any.

Navigating these complex data structures is a challenge, but Raku provides a couple of functions that can be used on them: deepmap and duckmap. While the former will go to every single element, in order, and do whatever the block passed requires,

say [[12, [34]],[[56, [78]]]].deepmap*.elems );
# OUTPUT: «[[1 1 [1 1]] [1 1 [1 1]]]␤»

which returns 1 because it goes to the deeper level and applies elems to them, deepmap can perform more complicated operations:

say [[12, [34]], [[56, [78]]]].duckmap:
   -> $array where .elems == 2 { $array.elems };
# OUTPUT: «[[1 2 2] [5 6 2]]␤»

In this case, it dives into the structure, but returns the element itself if it does not meet the condition in the block (1, 2), returning the number of elements of the array if it does (the two 2s at the end of each subarray).

Since deepmap and duckmap are Any methods, they also apply to Associative arrays:

say %first => [12], second => [3,4] ).deepmap*.elems );
# OUTPUT: «{first => [1 1], second => [1 1]}␤»

Only in this case, they will be applied to every list or array that is a value, leaving the keys alone.

Positional and Associative can be turned into each other.

say %first => [12], second => [3,4] ).list[0];
# OUTPUT: «second => [3 4]␤»

However, in this case, and for Rakudo >= 2018.05, it will return a different value every time it runs. A hash will be turned into a list of the key-value pairs, but it is guaranteed to be disordered. You can also do the operation in the opposite direction, as long as the list has an even number of elements (odd number will result in an error):

say <a b c d>.Hash # OUTPUT: «{a => b, c => d}␤»

But

say <a b c d>.Hash.kv # OUTPUT: «(c d a b)␤»

will obtain a different value every time you run it; kv turns every Pair into a list.

Complex data structures are also generally Iterable. Generating an iterator out of them will allow the program to visit the first level of the structure, one by one:

.say for 'א'..'ס'# OUTPUT: «א␤ב␤ג␤ד␤ה␤ו␤ז␤ח␤ט␤י␤ך␤כ␤ל␤ם␤מ␤ן␤נ␤ס␤»

'א'..'ס' is a Range, a complex data structure, and with for in front it will iterate until the list is exhausted. You can use for on your complex data structures by overriding the iterator method (from role Iterable):

class SortedArray is Array {
  method iterator() {
    self.sort.iterator
  }
};
my @thing := SortedArray.new([3,2,1,4]);
.say for @thing# OUTPUT: «1␤2␤3␤4␤»

for calls directly the iterator method on @thing making it return the elements of the array in order. Much more on iterating on the page devoted to it.

Functional structures

Raku is a functional language and, as such, functions are first-class data structures. Functions follow the Callable role, which is the 4th element in the quartet of fundamental roles. Callable goes with the & sigil, although in most cases it is elided for the sake of simplicity; this sigil elimination is always allowed in the case of Callables.

my &a-func= { (^($^þ)).Seq };
say a-func(3), a-func(7); # OUTPUT: «(0 1 2)(0 1 2 3 4 5 6)␤»

Blocks are the simplest callable structures, since Callables cannot be instantiated. In this case we implement a block that logs events and can retrieve them:

my $logger = -> $event$key = Nil  {
  state %store;
  if ( $event ) {
    %store{ DateTime.newnow ) } = $event;
  } else {
    %store.keys.grep( /$key/ )
  }
}
$logger"Stuff" );
$logger"More stuff" );
say $loggerNil"2018-05-28" ); # OUTPUT: «(Stuff More stuff)␤»

A Block has a Signature, in this case two arguments, the first of which is the event that is going to be logged, and the second is the key to retrieve the events. They will be used in an independent way, but its intention is to showcase the use of a state variable that is kept from every invocation to the next. This state variable is encapsulated within the block, and cannot be accessed from outside except by using the simple API the block provides: calling the block with a second argument. The two first invocations log two events, the third invocation at the bottom of the example use this second type of call to retrieve the stored values. Blocks can be cloned:

my $clogger = $logger.clone;
$clogger"Clone stuff" );
$clogger"More clone stuff" );
say $cloggerNil"2018-05-28" );
# OUTPUT: «(Clone stuff More clone stuff)␤» 

And cloning will reset the state variable; instead of cloning, we can create façades that change the API. For instance, eliminate the need to use Nil as first argument to retrieve the log for a certain date:

my $gets-logs = $logger.assumingNil* );
$logger( %(changing => "Logs") );
say $gets-logs"2018-05-28" );
# OUTPUT: «({changing => Logs} Stuff More stuff)␤» 

assuming wraps around a block call, giving a value (in this case, Nil) to the arguments we need, and passing on the arguments to the other arguments we represent using *. In fact, this corresponds to the natural language statement "We are calling $logger assuming the first argument is Nil". We can slightly change the appearance of these two Blocks to clarify they are actually acting on the same block:

my $Logger = $logger.clone;
my $Logger::logs = $Logger.assuming*Nil );
my $Logger::get = $Logger.assumingNil* );
$Logger::logs( <an array> );
$Logger::logs( %(key => 42) );
say $Logger::get"2018-05-28" );

Although :: is generally used for invocation of class methods, it is actually a valid part of the name of a variable. In this case we use them conventionally to simply indicate $Logger::logs and $Logger::get are actually calling $Logger, which we have capitalized to use a class-like appearance. The point of this tutorial is that using functions as first-class citizens, together with the use of state variables, allows the use of certain interesting design patterns such as this one.

As such first class data structures, callables can be used anywhere another type of data can.

my @regex-check = ( /<alnum>/, /<alpha>/, /<punct>/ );
say @regex-check.map: "33af" ~~ *;
# OUTPUT: «(「3」␤ alnum => 「3」 「a」␤ alpha => 「a」 Nil)␤»

Regexes are actually a type of callable:

say /regex/.doesCallable ); # OUTPUT: «True␤»

And in the example above we are calling regexes stored in an array, and applying them to a string literal.

Callables are composed by using the function composition operator ∘:

my $typer = -> $thing { $thing.^name ~ ' → ' ~ $thing };
my $Logger::withtype = $Logger::logs  $typer;
$Logger::withtypePair.new'left''right' ) );
$Logger::withtype( ¾ );
say $Logger::get"2018-05-28" );
# OUTPUT: «(Pair → left right Rat → 0.75)␤» 

We are composing $typer with the $Logger::logs function defined above, obtaining a function that logs an object preceded by its type, which can be useful for filtering, for instance. $Logger::withtype is, in fact, a complex data structure composed of two functions which are applied in a serial way, but every one of the callables composed can keep state, thus creating complex transformative callables, in a design pattern that is similar to object composition in the object oriented realm. You will have to choose, in every particular case, what is the programming style which is most suitable for your problem.

Defining and constraining data structures

Raku has different ways of defining data structures, but also many ways to constrain them so that you can create the most adequate data structure for every problem domain. but, for example, mixes roles or values into a value or a variable:

my %not-scalar := %(2 => 3but Associative[IntInt];
say %not-scalar.^name# OUTPUT: «Hash+{Associative[Int, Int]}␤» 
say %not-scalar.of;    # OUTPUT: «Associative[Int, Int]␤» 
%not-scalar{3} = 4;
%not-scalar<thing> = 3;
say %not-scalar;       # OUTPUT: «{2 => 3, 3 => 4, thing => 3}␤» 

In this case, but is mixing in the Associative[Int, Int] role; please note that we are using binding so that the type of the variable is the one defined, and not the one imposed by the % sigil; this mixed-in role shows in the name surrounded by curly braces. What does that really mean? That role includes two methods, of and keyof; by mixing the role in, the new of will be called (the old of would return Mu, which is the default value type for Hashes). However, that is all it does. It is not really changing the type of the variable, as you can see since we are using any kind of key and values in the next few statements.

However, we can provide new functionality to a variable using this type of mixin:

role Lastable {
  method last() {
    self.sort.reverse[0]
  }
}
my %hash-plus := %3 => 334 => 44but Lastable;
say %hash-plus.sort[0]; # OUTPUT: «3 => 33␤» 
say %hash-plus.last;    # OUTPUT: «4 => 44␤» 

In Lastable we use the universal self variable to refer to whatever object this particular role is mixed in; in this case it will contain the hash it is mixed in with; it will contain something else (and possibly work some other way) in other case. This role will provide the last method to any variable it's mixed with, providing new, attachable, functionalities to regular variables. Roles can even be added to existing variables using the does keyword.

Subsets can also be used to constrain the possible values a variable might hold; they are Raku attempt at gradual typing; it is not a full attempt, because subsets are not really types in a strict sense, but they allow runtime type checking. It adds type-checking functionality to regular types, so it helps create a richer type system, allowing things like the one shown in this code:

subset OneOver where (1/$_).Int == 1/$_;
my OneOver $one-fraction = ⅓;
say $one-fraction# OUTPUT: «0.333333␤» 

On the other hand, my OneOver $ = ⅔; will cause a type-check error. Subsets can use Whatever, that is, *, to refer to the argument; but this will be instantiated every time you use it to a different argument, so if we use it twice in the definition we would get an error. In this case we are using the topic single variable, $_, to check the instantiation. Subsetting can be done directly, without the need of declaring it, in signatures.

Infinite structures and laziness

It might be assumed that all the data contained in a data structure is actually there. That is not necessarily the case: in many cases, for efficiency reasons or simply because it is not possible, the elements contained in a data structure only jump into existence when they are actually needed. This computation of items as they are needed is called reification or vivification.

# A list containing infinite number of un-reified Fibonacci numbers: 
my @fibonacci = 11* + * … ∞;
 
# We reify 10 of them, looking up the first 10 of them with array index: 
say @fibonacci[^10]; # OUTPUT: «(1 1 2 3 5 8 13 21 34 55)␤» 
 
# We reify 5 more: 10 we already reified on previous line, and we need to 
# reify 5 more to get the 15th element at index 14. Even though we need only 
# the 15th element, the original Seq still has to reify all previous elements: 
say @fibonacci[14]; # OUTPUT: «987␤» 

Above we were reifying a Seq we created with the sequence operator, but other data structures use the concept as well. For example, an un-reified Range is just the two end points. In some languages, calculating the sum of a huge range is a lengthy and memory-consuming process, but Raku calculates it instantly:

say sum 1 .. 9_999_999_999_999# OUTPUT: «49999999999995000000000000␤»

Why? Because the sum can be calculated without vivifying the Range; that is, without figuring out all the elements it contains. This is why this feature exists. You can even make your own things reify-on-demand, using gather and take:

my $seq = gather {
    say "About to make 1st element"take 1;
    say "About to make 2nd element"take 2;
}
say "Let's reify an element!";
say $seq[0];
say "Let's reify more!";
say $seq[1];
say "Both are reified now!";
say $seq[^2];
 
# OUTPUT: 
# Let's reify an element! 
# About to make 1st element 
# 1 
# Let's reify more! 
# About to make 2nd element 
# 2 
# Both are reified now! 
# (1 2)

Following the output above, you can see the print statements inside the gather got executed only when we reified the individual elements while looking up an element. Also note that the elements got reified just once. When we printed the same elements again on the last line of the example, the messages inside gather was no longer printed. This is because the construct used already-reified elements from the Seq's cache.

Note that above we assigned the gather to a Scalar container (the $ sigil), not the Positional one (the @ sigil). The reason is that the @-sigiled variables are mostly eager. What this means is they reify the stuff assigned to them right away most of the time. The only time they don't do it is when the items are known to be is-lazy, like our sequence generated with infinity as the end point. Were we to assign the gather to a @-variable, the say statements inside of it would've been printed right away.

Another way to vivify the list fully is by calling .elems on it. This is the reason why checking whether a list contains any items is best done by using .Bool method (or just using if @array { … }), since you don't need to reify all the elements to find out if there are any of them.

There are times where you do want to fully-reify a list before doing something. For example, the IO::Handle.lines returns a Seq. The following code contains a bug; keeping reification in mind, try to spot it:

my $fh = "/tmp/bar".IO.open;
my $lines = $fh.lines;
close $fh;
say $lines[0];

We open a filehandle, then assign return of .lines to a Scalar variable, so the returned Seq does not get reified right away. We then close the filehandle, and try to print an element from $lines.

The bug in the code is by the time we reify the $lines Seq on the last line, we've already closed the filehandle. When the Seq's iterator tries to generate the item we've requested, it results in the error about attempting to read from a closed handle. So, to fix the bug we can either assign to a @-sigiled variable or call .elems on $lines before closing the handle:

my $fh = "/tmp/bar".IO.open;
my @lines = $fh.lines;
close $fh;
say @lines[0]; # no problem! 

We can also use any function whose side effect is reification, like .elems mentioned above:

my $fh = "/tmp/bar".IO.open;
my $lines = $fh.lines;
say "Read $lines.elems() lines"# reifying before closing handle 
close $fh;
say $lines[0]; # no problem! 

Using eager will also reify the whole sequence:

my $fh = "/tmp/bar".IO.open;
my $lines = eager $fh.lines# Uses eager for reification. 
close $fh;
say $lines[0];

Introspection

Languages that allow introspection like Raku have functionalities attached to the type system that let the developer access container and value metadata. This metadata can be used in a program to carry out different actions depending on their value. As it is obvious from the name, metadata are extracted from a value or container via the metaclass.

my $any-object = "random object";
my $metadata = $any-object.HOW;
say $metadata.^mro;                   # OUTPUT: «((ClassHOW) (Any) (Mu))␤» 
say $metadata.can$metadata"uc" ); # OUTPUT: «(uc uc)␤»

With the first say we show the class hierarchy of the metamodel class, which in this case is Metamodel::ClassHOW. It inherits directly from Any, meaning any method there can be used; it also mixes in several roles which can give you information about the class structure and functions. But one of the methods of that particular class is can, which we can use to look up whether the object can use the uc (uppercase) method, which it obviously can. However, it might not be so obvious in some other cases, when roles are mixed in directly into a variable. For instance, in the case of %hash-plus defined above:

say %hash-plus.^can("last"); # OUTPUT: «(last)␤» 

In this case we are using the syntactic sugar for HOW.method, ^method, to check if your data structure responds to that method; the output, which shows the name of the methods that match, certifies that we can use it.

See also this article on class introspection on how to access class properties and methods, and use it to generate test data for a class; this Advent Calendar article describes the metaobject protocol extensively.