# Hyperchunks¶

To meet a wide variety of needs for incremental and interactive data ingestion
and retrieval, Slycat has evolved a complex data storage hierarchy. At the top
of the hierarchy are *projects*, which provide administrative and access
controls, grouping together related analytical results. *Models* are owned by
projects, and represent instances of specific analysis types. Models contain
data *artifacts*, whose layout and structure are dictated by the model type.
Each artifact in a model is identified by name, which can be an arbitrary
string. There are three types of artifacts: *parameters* are JSON objects of
arbitrary complexity, intended for storage of small quantities of metadata.
*Files* are opaque binary objects that can store large quantities of data,
along with an explicitly stored MIME type. The final and most widely used type
of artifact is an *arrayset*, which is a one-dimensional array of *darrays*. A
darray is a dense, multi-dimensional multi-attribute array, and an arrayset
stores \(n\) darrays that can be accessed by integer indices in the range
\([0, n)\). In-turn, each *attribute* in a darray can be accessed by its
integer index, and the elements in each attribute can be identified using a
*hyperslice*, which includes a *slice* of element indices for each dimension of
the darray.

The bulk of the data in a Slycat model is stored in arraysets, and each time a
client reads or writes data to an arrayset, it must specify all of the
parameters mentioned above. To make this process simpler, while allowing for a
wide variety of data access patterns, we group this information into
*hyperchunks*, and have developed the Hyperchunk Query Language or HQL to
serve as a compact specification for a set of hyperchunks. Using HQL, a client
can read and write data that spans the arrays and attributes in an arrayset,
including computed attributes and arbitrary expressions.

## Basic HQL¶

To begin, the most basic building-block in HQL is a *slice* expression, which
follows the same syntactic rules as slicing in the Python language: At its
most general a slice takes the form “start:stop:skip”, which specifies every
\(skip\)-th element in the half-open range \([start, stop)\). If start
is omitted, it defaults to zero. If stop is omitted, it defaults to the length
of the available range. If skip is omitted it defaults to one. If start or
stop are negative, they represent indices counted backwards from the end of the
available range. Start, stop, and skip may be omitted or used in any
combination desired:

- “10:20:2” - every other index in the range \([10, 20)\).
- “10:20” - every index in the range \([10, 20)\).
- “10:” - every index from 10 through the end of the available range.
- “:20” - every index in the range \([0, 20)\).
- “…” - every index in the available range.
- “:” - every index in the available range.
- “::” - every index in the available range.
- “::2” - every other index in the available range, starting with zero: \(0, 2, 4, ...\).
- “1::2” - every other index in the available range, starting with one: \(1, 3, 5, ...\).
- “10” - index 10.
- “-1” - last index in the available range.
- “-10:” - last ten indices in the available range.

Recall that a slice is a range of indices along a single dimension, while
darrays are multi-dimensional. Thus, to retrieve data from a darray with more
than one dimension, we need to specify *hyperslice* expressions. To do this,
HQL uses slice expressions separated by commas. For example:

- “1” - index 1 of a vector.
- “1,2” - row 1, column 2 of a matrix.
- “3,…” - row 3 of a matrix.
- “…,4” - column 4 of a matrix.
- “50:60,7” - rows \([50, 60)\) from column 7 in a matrix.
- “50:60,7:10” - rows \([50, 60)\) from columns \([7, 10)\) in a matrix.

Additionally, HQL allows us to combine multiple hyperslice expressions, separated by vertical bars. This means we can specify irregular sets of data that can’t be specified with the normal slice syntax alone:

- “1|3|4” - indices 1, 3, and 4 of a vector.
- “10:20|77” - indices \([10, 20)\) and 77 from a vector.
- “1,2|33,4” - cells 1,2 and 33,4 from a matrix.

With all this in mind, we can begin putting the pieces together into hyperchunks. A typical HQL expression includes three pieces of information, separated with forward slashes:

```
array expression / attribute expression / hyperslice expression
```

Since an arrayset is a one-dimensional set of darrays, an HQL array expression is a set of one-or-more one-dimensional hyperslice expressions. Similarly, array attributes are accessed by their one-dimensional attribute indices, so basic HQL attribute attribute expressions are also one-dimensional hyperslices. Finally, the subset of each attribute to retrieve is specified using one-or-more multi-dimensional hyperslices, which must match the dimensionality of the underlying array. Here are some simple examples:

- “1/2/10” - array 1, attribute 2, element 10
- “1/2/10:20” - array 1, attribute 2, elements \([10, 20)\).
- “1/2/…” - the entire contents of array 1, attribute 2
- “1/2:4/…” - the entire contents of array 1, attributes 2 and 3
- “…/2/…” - the entire contents of attribute 2 for every array in the arrayset.
- “…/…/…” - everything in the entire arrayset.

The preceding examples assume one-dimensional darrays. Here are some examples of working with matrices:

- “1/2/10:20,30:40” - a ten-by-ten subset of the matrix stored in array 1, attribute 2.
- “1/2/:,3” - column 3 of the matrix stored in array 1, attribute 2.
- “1/2/3,…” - row 3 of the matrix stored in array 1, attribute 2.

And here are examples using multiple hyperslices:

- “1|3|4/…/…” - the entire contents of arrays 1, 3, and 4.
- “1/3|7|8/…” - the entire contents of array 1, attributes 3, 7, and 8.
- “1/2/:,0|:,3|:10” - columns 0, 3, and 10 from the matrix stored in array 1, attribute 2.

Note that when you use HQL to specify the locations for reading and writing data, the data will contain the cartesian product of the specified arrays, attributes, and hyperslices, in array-attribute-hyperslice order. For example, retrieving the hyperchunk “0:2/4:6/10:20|30:40” will return, in-order:

- Array 0, attribute 4, elements 10:20
- Array 0, attribute 4, elements 30:40
- Array 0, attribute 5, elements 10:20
- Array 0, attribute 5, elements 30:40
- Array 1, attribute 4, elements 10:20
- Array 1, attribute 4, elements 30:40
- Array 1, attribute 5, elements 10:20
- Array 1, attribute 5, elements 30:40

All of the APIs that work with hyperchunks take a set of hyperchunks, rather than a single hyperchunk, as their parameter. You can combine multiple hyperchunks by separating them with semicolons:

- “1/2/…;3/4/…” - the entire contents of array 1 attribute 2 and array 3 attribute 4.

## Advanced HQL¶

In addition to slices specifying attribute indices, HQL attribute expressions can include computed expressions that generate attribute data “on the fly”. Attribute expressions currently include function execution and a full set of boolean expressions, including set operations:

- “0/1|index(0)/…” - The entire contents of array 0, attribute 1, plus coordinate indices along dimension 0.
- “0/1|rank(a1,”asc”)/…” - The entire contents of array 0, attribute 1, plus the rank of each attribute 1 element in ascending order.
- “0/1|a1 > 5/…” - Return the entire contents of array 0, attribute 1, and whether each attribute 1 element is greater than five.
- “0/1|a1 > 5 and a1 < 13/…” - Return the entire contents of array 0, attribute 1, and whether each attribute 1 element is between five and thirteen.
- “0/1|a1 in [“red”, “cinnamon”]/…” - Return the entire contents of array 0, attribute 1, and whether each attribute 1 element matches “red” or “cinnamon”.

HQL provides a full set of boolean operators: <, >, <=, >=, ==, and !=, along with in and not in for testing set membership, plus and and or for logical comparisons. You may use parentheses to control the precedence of complex expressions. Of course, you can specify as many computed attribute expressions as you like, using vertical pipes as a separator.

HQL also allows an optional fourth type of expression, an “order” expression, used to sort the data to be returned. The order expression should return an integer rank for each element in the data to be returned and appears between the attribute expression and the hyperslices expression:

- 0/1/order:rank(a1,”asc”)/… - The entire contents of array 0, attribute 1, sorted in ascending order.
- 0/1/order:rank(a2, “desc”)/… - The entire contents of array 0, attribute 1, sorted in descending order of attribute 2
- 0/1/order:rank(a1,”asc”)/0:10 - Array 0, attribute 1, first ten elements in ascending order.

Note that the hyperslice in the final example retrieves the first ten elements of the sorted data, rather than the first ten elements of the attribute.

## HQL Context¶

Depending on the context, not all APIs allow every HQL feature. For example, APIs that write data don’t allow computed attribute expressions; some APIs only allow array expressions; others allow only array and attribute expressions. For those situations, you may omit the other parts of the HQL. For example:

- “10:20;35” - arrays \([10, 20)\) plus array 35.
- “3/4;5/7” - array 3 attribute 4, plus array 5 attribute 7.