Reputation: 3136
This time I have a matrix --IN A FILE-- called "matrix.csv" and I want to read it in. I can do it in two flavors, dense and sparse.
Dense
matrix.csv
3.0, 0.8, 1.1, 0.0, 2.0
0.8, 3.0, 1.3, 1.0, 0.0
1.1, 1.3, 4.0, 0.5, 1.7
0.0, 1.0, 0.5, 3.0, 1.5
2.0, 0.0, 1.7, 1.5, 3.0
Sparse
matrix.csv
1,1,3.0
1,2,0,8
1,3,1.1
// 1,4 is missing
1,5,2.0
...
5,5,3.0
Assume the file is pretty large. In both cases, I want to read these into a Matrix with the appropriate dimensions. In the dense case I probably don't need to provide meta-data. In the second, I was thinking I should provide the "frame" of the matrix, like
matrix.csv
nrows:5
ncols:5
But I don't know the standard patterns.
== UPDATE ==
It's a bit difficult to find, but the mmreadsp can change your day from "Crashing the server" to "done in 11 seconds". Thanks to Brad Cray (not his real name) for pointing it out!
Upvotes: 2
Views: 348
Reputation: 1865
Since Chapel matrices are represented as arrays, this question is equivalent to:
"How to read an array from a file in Chapel".
Ideally, a csv
module or a specialized IO-formatter (similar to JSON formatter) would handle csv
I/O more elegantly, but this answer reflects the array I/O options available as of Chapel 1.16 pre-release.
Dense arrays are the easy case, since DefaultRectangular
arrays (the default type
of a Chapel array) come with a .readWriteThis(f)
method. This method allows one to read and write an array with built-in write()
and read()
methods, as shown below:
var A: [1..5, 1..5] real;
// Give this array some values
[(i,j) in A.domain] A[i,j] = i + 10*j;
var writer = open('dense.txt', iomode.cw).writer();
writer.write(A);
writer.close();
var B: [1..5, 1..5] real;
var reader = open('dense.txt', iomode.r).reader();
reader.read(B);
reader.close();
assert(A == B);
The dense.txt
looks like this:
11.0 21.0 31.0 41.0 51.0
12.0 22.0 32.0 42.0 52.0
13.0 23.0 33.0 43.0 53.0
14.0 24.0 34.0 44.0 54.0
15.0 25.0 35.0 45.0 55.0
However, this assumes you know the array shape in advance. We can remove this constraint by writing the array shape at the top of the file, as shown below:
var A: [1..5, 1..5] real;
[(i,j) in A.domain] A[i,j] = i + 10*j;
var writer = open('dense.txt', iomode.cw).writer();
writer.writeln(A.shape);
writer.write(A);
writer.close();
var reader = open('dense.txt', iomode.r).reader();
var shape: 2*int;
reader.read(shape);
var B: [1..shape[1], 1..shape[2]] real;
reader.read(B);
reader.close();
assert(A == B);
Now, dense.txt
looks like this:
(5, 5)
11.0 21.0 31.0 41.0 51.0
12.0 22.0 32.0 42.0 52.0
13.0 23.0 33.0 43.0 53.0
14.0 24.0 34.0 44.0 54.0
15.0 25.0 35.0 45.0 55.0
Sparse arrays require a little more work, because DefaultSparse
arrays (the default type
of a sparse Chapel array) only provide a .writeThis(f)
method and not a .readThis(f)
method as of Chapel 1.16 pre-release. This means we have builtin support for writing sparse arrays, but not reading them.
Since you specifically requested csv format, we'll do sparse arrays in csv:
// Create parent domain, sparse subdomain, and sparse array
const D = {1..10, 1..10};
var spD: sparse subdomain(D);
var A: [spD] real;
// Add some non-zeros:
spD += [(1,1), (1,5), (2,7), (5, 4), (6, 6), (9,3), (10,10)];
// Set non-zeros to 1.0 (to make things interesting?)
A = 1.0;
var writer = open('sparse.csv', iomode.cw).writer();
// Write shape
writer.writef('%n,%n\n', A.shape[1], A.shape[2]);
// Iterate over non-zero indices, writing: i,j,value
for (i,j) in spD {
writer.writef('%n,%n,%n\n', i, j, A[i,j]);
}
writer.close();
var reader = open('sparse.csv', iomode.r).reader();
// Read shape
var shape: 2*int;
reader.readf('%n,%n', shape[1], shape[2]);
// Create parent domain, sparse subdomain, and sparse array
const Bdom = {1..shape[1], 1..shape[2]};
var spBdom: sparse subdomain(Bdom);
var B: [spBdom] real;
// This is an optimization that bulk-adds the indices. We could instead add
// the indices directly to spBdom and the value to B[i,j] each iteration
var indices: [1..0] 2*int,
values: [1..0] real;
// Variables to be read into
var i, j: int,
val: real;
while reader.readf('%n,%n,%n', i, j, val) {
indices.push_back((i,j));
values.push_back(val);
}
// bulk add the indices to spBdom and add values to B element-wise
spBdom += indices;
for (ij, v) in zip(indices, values) {
B[ij] = v;
}
reader.close();
// Sparse arrays can't be zippered with anything other than their domains and
// sibling arrays, so we need to do an element-wise assertion:
assert(A.domain == B.domain);
for (i,j) in A.domain {
assert(A[i,j] == B[i,j]);
}
And sparse.csv
looks like this:
10,10
1,1,1
1,5,1
2,7,1
5,4,1
6,6,1
9,3,1
10,10,1
Lastly, I'll mention that there is a MatrixMarket
package module that supports dense & sparse array I/O using the matrix market format. This is currently not shown on the public documentation, because it is intended to be moved out as a standalone package once the package manager is reliable enough, but you can use it in your chapel programs with use MatrixMarket;
, currently.
Here is the source code, which includes documentation for the interface as comments.
Here are the tests, if you prefer to learn from example, rather than documentation & source code.
Upvotes: 1
Reputation: 1
( if one happens to remember the PC Tools utility, the Matrix Tools, pioneered and authored by prof. Zitny, were similarly indispensable for smart abstract-representations of large scale F77 FEM matrices, using
COMMON
-block and similar tricks for large and sparse-matrix efficient storage & operations in numerical-processing projects ... )
I cannot disagree more with the last remark on a need to have the "frame", so as to build a sparse matrix.
Matrix is always just an interpretation of some formalism.
While sparse-matrix share the same view on a matrix, as an interpretation, the implementation of each of such module is always strictly based on some concrete representation.
Different kinds of sparsity are always handled using different cells-layout-strategy ( the trick is to use a minimum-needed [SPACE]
for cell-elements, while yet having some acceptable processing [TIME]
overhead, when trying to perform classical matrix/vector operations on such matrix ( typically without user knowing or "manually" bothering with the underlying sparse-matrix representation, that was used for storing the cell values, and how is that being optimally decoded / translated into a target-sparse-matrix's representation ).
Put it visually, the Matrix Tools will show you each of the representations as compact as possible in their best-possible memory-layouts ( very like in the PC Tools it had compressed your Hard-Disk, laying sector-data so as to avoid any un-necessary non-contiguous HDD-capacity get wasted ) and the very ( type-by-type specific ) representation-aware handler will then provide any external observer the complete illusion, needed for an assumed matrix interpretation ( during the phase of computing ).
So let's realise first, that not knowing all the details about the platform-specific rules, used for a sparse-matrix representation, both on the source-side ( python-?, JSON-meta-payload-?, etc ) and on the chapel target-side ( LinearAlgebra
ver-1.16 being yet confirmed not to be public ( W.I.P. ), there is not much to start to implement.
The actual materialisation of a ( yet un-known ) sparse-matrix representation ( be it a file://
, a DMA-access or a CSP-channel or any other means of a Non-InRAM storage or an InRAM memory-map ) does not change the solution of cross-representation xlator a single bit.
As a matematician, you may enjoy the concept of representation being less a Cantor-set driven ( running into (almost) infinite, dense enumerations ) objects, but rather using Vopenka's Alternative Set Theory ( so lovely introduced with in-depth both historical and mathematical contexts in Vopenka's "Meditations About The Bases of Science" ) that has brought and polished much closer views on these very situations with a yet changing Horizon-of-Definition ( caused not only by an actual sharpness of observers view, but in a much broader and general sense of such a principle ), leaving pi-class and sigma-class semi-sets ready for continuous handling of emerging new details, as they come into our recognised part of the view ( once appearing "in front" of the Horizon-of-Definition ) about the observed ( and mathematicised ) phenomenon.
Sparse-matrices ( as a representation ) help us build the interpretation we need, so as to use the so far acquired data-cells in further processing "as a matrix".
a) the constraints and rules used in the sparse-matrix source-system's representation
b) the additional constraints a mediation-channel imposes ( expressivity, format, self-healing/error-prone ) irrespective of it being a file, a CSP-channel or a ZeroMQ / nanomsg smart-socket signalling- / messaging-plane distributed agent infrastructure
c) the constraints and rules imposed in the target-system's representation, setting rules for defining / loading / storing / further handling & computing that a sparse-matrix type of one's choice has to meet / follow in the target computing eco-system
Not knowing the a) would introduce unnecessarily large overheads on preparing the strategy for both a successful and efficient cross-representation pipeline i.e. for translating the common interpretation from source-side representation for entering the b). Ignoring the c) would always cause a penalty - to pay additional overheads in target-eco-system during the b)'s-mediated reconstruction of a communicated-interpretation onto the target-representation.
Upvotes: 0