diedro
diedro

Reputation: 613

MPI reading simple dataframe or txt file

I have this dataframe:

date,AA,BB,CC
2018-01-01 00:00:00,45.73,0.0,1
2018-01-01 01:00:00,44.16,0.0,2
2018-01-01 02:00:00,42.24,0.0,3
2018-01-01 03:00:00,39.29,0.0,5
2018-01-01 04:00:00,36.0,0.0,6
2018-01-01 05:00:00,41.99,0.0,7
2018-01-01 06:00:00,42.25,0.0,8

I would like to if it is possible to read it with the MPI I/O paradigm.

In particular, I would like to divide the rows according to the number of processors. Suppose yo have 4 processors. I would like that each processor read two lines: processor 0, lines 1,2; processor 1, line 3,4; and so on.

I have studies some materials. As far I have understood I have to do a sort of offset and to write the file in one single line. Another possibility could be use something related to subgrids.

However, as you can noticed there are different kind of variables in each line.

Could someone of you give a glue? What I have found so far about MPI I/O is very theoretical and with no practical examples.

Thanks, Diego

Upvotes: 0

Views: 184

Answers (1)

Rob Latham
Rob Latham

Reputation: 5223

MPI-IO works great for binary data. It is less well suited for text data.

If this were binary data, I would expect a header and an index. Rank 0 could read that header and index, broadcast to everyone where the data resides, and then some algorithmic decomposistion of records could happen (e.g. each rank reads N records)

For an ascii file like this you're right: how do you split up the file?

How big are these files? If they are several megabytes big (so not that large), read the data on rank 0 and distribute from there

Another approach might be to generate an index -- either part of the dataframe or a separate binary index. That index would map records to file offsets and now you can split up the job of reading across all the proceses.

Upvotes: 0

Related Questions