MapSet

Data structure for holding optimized "sparse" N-dimensional matrices. The number of dimensions is arbitrary and can be varied at runtime.

Unlike classical sparse arrays/matrices, this data structure is optimized not just for arrays which are mostly zero (or any given value), but for any matrices where data is likely to repeat across dimensions.

This is done by storing the data as a tree, with each depth representing one dimension. A sub-tree is, thus, a sub-matrix (slice of the matrix represented by the top-level tree) along the top-level dimension. Each sub-tree is individually immutable, and can therefore be shared within the same tree or even by several instances of this type.

Dimension order need not be the same for all sub-trees. Even the number of dimensions in siblings' sub-trees need not be the same; i.e, the data structure is itself sparse with regards to the dimensions.

As there is no explicit "value" type (associated with every set of coordinates), this data structure can be seen as a representation of a set of points, but it can be simulated by adding a "value" dimension (in the same way how a gray-scale image can be represented as a set of 3D points like a height-map).

An alternative way to look at this data structure is as a set of maps (i.e. associative arrays). Thus, each layer of the tree stores one possible map key, and under it, all the maps that have this key with a specific value. Note that for this use case, DimValue needs to have a value (configured using nullValue) which indicates the absence of a certain key (dimension) in a map.

static if(__VERSION__ >= 2083)

struct MapSet (

DimName

DimValue

DimValue nullValue = DimValue.init

) {

struct Pair;

struct Node;

immutable(Node)* root;

static void clearCache();

void addToCache();

MapSet deduplicate();

size_t uniqueNodes();

DimName[] getDims();

HashSet!(immutable(DimValue))[DimName] getDimsAndValues();

MapSet merge(MapSet other);

MapSet subtract(MapSet other);

MapSet lazyMap(MapSet delegate(MapSet) fn);

MapSet remove(DimName dim);

MapSet remove(bool delegate(DimName) pred);

MapSet set(DimName dim, DimValue value);

MapSet slice(DimName dim, DimValue value);

MapSet get(DimName dim, DimValue value);

const(DimValue)[] all(DimName dim);

MapSet cartesianProduct(DimName dim, DimValue[] values);

MapSet cartesianProduct(MapSet other);

MapSet completeSuperset();

MapSet bringToFront(DimName dim);

MapSet optimize();

MapSet normalize();

MapSet swapDepth(size_t depth);

MapSet reorderUsing(MapSet reference);

void toString(void delegate(const(char)[]) sink);

hash_t toHash();

bool opEquals(typeof(this) s);

}

Members

Functions

addToCache void addToCache(): Because MapSet operations benefit greatly from memoization, maintaining a set of interned sets benefits performance. After calling clearCache, call this method to re-register extant live instances to the instance cache.
all const(DimValue)[] all(DimName dim): Return all unique values occurring for a given dimension. Unless this is the empty set, the return value is always non-empty. If dim doesn't occur, it will be [nullValue].
bringToFront MapSet bringToFront(DimName dim): Refactor this matrix into one with the same data, but putting the given dimension in front. This will speed up access to values with the given dimension. If the dimension does not yet occur in the set (or any subset), it is instantiated with a single nullValue value. The set must be non-empty.
cartesianProduct MapSet cartesianProduct(DimName dim, DimValue[] values): Return a set which represents the Cartesian product between this set and the given values across the specified dimension.
cartesianProduct MapSet cartesianProduct(MapSet other): Return a set which represents the Cartesian product between this and the given set. Duplicate dimensions are first removed from this set. For best performance, call big.cartesianProduct(small)
completeSuperset MapSet completeSuperset(): Return a superset of this set, consisting of a Cartesian product of every value of every dimension. The total number of unique nodes is thus equal to the number of dimensions.
count size_t count(): Return the total number of items in this set.
deduplicate MapSet deduplicate(): Intern and deduplicate this MapSet. Needs to be called only after constructing a MapSet manually (by allocating, populating, and setting root).
get MapSet get(DimName dim, DimValue value): Return a subset of this set for all points where the given dimension has this value. Unlike slice, the dimension itself is included in the result (with the given value).
getDims DimName[] getDims(): Collect the names of all dimensions occurring in this tree.
getDimsAndValues HashSet!(immutable(DimValue))[DimName] getDimsAndValues(): Return all values for all dimensions occurring in this set. The Cartesian product of these values would thus be a superset of this set. This is equivalent to, but faster than, calling getDims and then all for each dim.
lazyMap MapSet lazyMap(MapSet delegate(MapSet) fn): Apply fn over subsets, and return a new MapSet.
merge MapSet merge(MapSet other): Combine two matrices together, returning their union. If other is a subset of this, return this unchanged.
normalize MapSet normalize(): Refactor this matrix into one in which dimensions always occur in the same order, no matter what path is taken.
opEquals bool opEquals(typeof(this) s): Undocumented in source. Be warned that the author may not have intended to support it.
optimize MapSet optimize(): Refactor this matrix into one with the same data, but attempting to lower the total number of nodes.
remove MapSet remove(DimName dim): "Unset" a given dimension, removing it from the matrix. The result is the union of all sub-matrices for all values of dim.
remove MapSet remove(bool delegate(DimName) pred): Unset dimensions according to a predicate. This is faster than removing dimensions individually, however, unlike the DimName overload, this one does not benefit from global memoization.
reorderUsing MapSet reorderUsing(MapSet reference): Refactor this matrix into one with the same data, but in which the dimensions always occur as in reference (which is assumed to be normalized).
set MapSet set(DimName dim, DimValue value): Set the given dimension to always have the given value, collapsing (returning the union of) all sub-matrices for previously different values of dim.
slice MapSet slice(DimName dim, DimValue value): Return a sub-matrix for all points where the given dimension has this value. The dimension itself is not included in the result. Note: if dim doesn't occur (e.g. because this is the unit set) and value is nullValue, this returns the unit set (not the empty set).
subtract MapSet subtract(MapSet other): Return the difference between this and the given set. If other does not intersect with this, return this unchanged.
swapDepth MapSet swapDepth(size_t depth): Swap the two adjacent dimensions which are depth dimensions away.
toHash hash_t toHash(): Undocumented in source. Be warned that the author may not have intended to support it.
toString void toString(void delegate(const(char)[]) sink): Undocumented in source. Be warned that the author may not have intended to support it.
uniqueNodes size_t uniqueNodes(): Count and return the total number of unique nodes in this MapSet.

Static functions

clearCache void clearCache(): Clear the global operations cache.

Static variables

cache Cache cache;: Because subtrees can be reused within the tree, a way of memoizing operations across the entire tree (instead of just across children of a single node, or siblings) is crucial for performance.
emptySet auto emptySet;: The empty set. Represents a set which holds zero values.
unitSet auto unitSet;: A set containing a single nil-dimensional element. Holds exactly one value (a point with all dimensions at nullValue).

Structs

Node struct Node: Undocumented in source.
Pair struct Pair: Logically, each MapSet node has a map of values to a subset. However, it is faster to represent that map as an array of key-value pairs rather than a D associative array, so we do that here.

Variables

root immutable(Node)* root;: If emptySetRoot, zero values. If null, one value with zero dimensions. Otherwise, pointer to node describing the next dimension.

Parameters

DimName: The type used to indicate the name (or index) of a dimension.
DimValue: The type for indicating a point on a dimension's axis.
nullValue: A value for DimValue indicating that a dimension is unset