MapSet

Data structure for holding optimized "sparse" N-dimensional matrices. The number of dimensions is arbitrary and can be varied at runtime.

Unlike classical sparse arrays/matrices, this data structure is optimized not just for arrays which are mostly zero (or any given value), but for any matrices where data is likely to repeat across dimensions.

This is done by storing the data as a tree, with each depth representing one dimension. A sub-tree is, thus, a sub-matrix (slice of the matrix represented by the top-level tree) along the top-level dimension. Each sub-tree is individually immutable, and can therefore be shared within the same tree or even by several instances of this type.

Dimension order need not be the same for all sub-trees. Even the number of dimensions in siblings' sub-trees need not be the same; i.e, the data structure is itself sparse with regards to the dimensions.

As there is no explicit "value" type (associated with every set of coordinates), this data structure can be seen as a representation of a set of points, but it can be simulated by adding a "value" dimension (in the same way how a gray-scale image can be represented as a set of 3D points like a height-map).

An alternative way to look at this data structure is as a set of maps (i.e. associative arrays). Thus, each layer of the tree stores one possible map key, and under it, all the maps that have this key with a specific value. Note that for this use case, DimValue needs to have a value (configured using nullValue) which indicates the absence of a certain key (dimension) in a map.

Members

Functions

addToCache
void addToCache()

Because MapSet operations benefit greatly from memoization, maintaining a set of interned sets benefits performance. After calling clearCache, call this method to re-register extant live instances to the instance cache.

all
const(DimValue)[] all(DimName dim)

Return all unique values occurring for a given dimension. Unless this is the empty set, the return value is always non-empty. If dim doesn't occur, it will be [nullValue].

bringToFront
MapSet bringToFront(DimName dim)

Refactor this matrix into one with the same data, but putting the given dimension in front. This will speed up access to values with the given dimension. If the dimension does not yet occur in the set (or any subset), it is instantiated with a single nullValue value. The set must be non-empty.

cartesianProduct
MapSet cartesianProduct(DimName dim, DimValue[] values)

Return a set which represents the Cartesian product between this set and the given values across the specified dimension.

cartesianProduct
MapSet cartesianProduct(MapSet other)

Return a set which represents the Cartesian product between this and the given set. Duplicate dimensions are first removed from this set. For best performance, call big.cartesianProduct(small)

completeSuperset
MapSet completeSuperset()

Return a superset of this set, consisting of a Cartesian product of every value of every dimension. The total number of unique nodes is thus equal to the number of dimensions.

count
size_t count()

Return the total number of items in this set.

deduplicate
MapSet deduplicate()

Intern and deduplicate this MapSet. Needs to be called only after constructing a MapSet manually (by allocating, populating, and setting root).

get
MapSet get(DimName dim, DimValue value)

Return a subset of this set for all points where the given dimension has this value. Unlike slice, the dimension itself is included in the result (with the given value).

getDims
DimName[] getDims()

Collect the names of all dimensions occurring in this tree.

getDimsAndValues
HashSet!(immutable(DimValue))[DimName] getDimsAndValues()

Return all values for all dimensions occurring in this set. The Cartesian product of these values would thus be a superset of this set. This is equivalent to, but faster than, calling getDims and then all for each dim.

lazyMap
MapSet lazyMap(MapSet delegate(MapSet) fn)

Apply fn over subsets, and return a new MapSet.

merge
MapSet merge(MapSet other)

Combine two matrices together, returning their union. If other is a subset of this, return this unchanged.

normalize
MapSet normalize()

Refactor this matrix into one in which dimensions always occur in the same order, no matter what path is taken.

opEquals
bool opEquals(typeof(this) s)
Undocumented in source. Be warned that the author may not have intended to support it.
optimize
MapSet optimize()

Refactor this matrix into one with the same data, but attempting to lower the total number of nodes.

remove
MapSet remove(DimName dim)

"Unset" a given dimension, removing it from the matrix. The result is the union of all sub-matrices for all values of dim.

remove
MapSet remove(bool delegate(DimName) pred)

Unset dimensions according to a predicate. This is faster than removing dimensions individually, however, unlike the DimName overload, this one does not benefit from global memoization.

reorderUsing
MapSet reorderUsing(MapSet reference)

Refactor this matrix into one with the same data, but in which the dimensions always occur as in reference (which is assumed to be normalized).

set
MapSet set(DimName dim, DimValue value)

Set the given dimension to always have the given value, collapsing (returning the union of) all sub-matrices for previously different values of dim.

slice
MapSet slice(DimName dim, DimValue value)

Return a sub-matrix for all points where the given dimension has this value. The dimension itself is not included in the result. Note: if dim doesn't occur (e.g. because this is the unit set) and value is nullValue, this returns the unit set (not the empty set).

subtract
MapSet subtract(MapSet other)

Return the difference between this and the given set. If other does not intersect with this, return this unchanged.

swapDepth
MapSet swapDepth(size_t depth)

Swap the two adjacent dimensions which are depth dimensions away.

toHash
hash_t toHash()
Undocumented in source. Be warned that the author may not have intended to support it.
toString
void toString(void delegate(const(char)[]) sink)
Undocumented in source. Be warned that the author may not have intended to support it.
uniqueNodes
size_t uniqueNodes()

Count and return the total number of unique nodes in this MapSet.

Static functions

clearCache
void clearCache()

Clear the global operations cache.

Static variables

cache
Cache cache;

Because subtrees can be reused within the tree, a way of memoizing operations across the entire tree (instead of just across children of a single node, or siblings) is crucial for performance.

emptySet
auto emptySet;

The empty set. Represents a set which holds zero values.

unitSet
auto unitSet;

A set containing a single nil-dimensional element. Holds exactly one value (a point with all dimensions at nullValue).

Structs

Node
struct Node
Undocumented in source.
Pair
struct Pair

Logically, each MapSet node has a map of values to a subset. However, it is faster to represent that map as an array of key-value pairs rather than a D associative array, so we do that here.

Variables

root
immutable(Node)* root;

If emptySetRoot, zero values. If null, one value with zero dimensions. Otherwise, pointer to node describing the next dimension.

Parameters

DimName

The type used to indicate the name (or index) of a dimension.

DimValue

The type for indicating a point on a dimension's axis.

nullValue

A value for DimValue indicating that a dimension is unset

Meta