Adam Merckx
Adam Merckx

Reputation: 1214

Load Text File as a matrix in MATLAB

I have a text-file which is a huge set of data(around 9 GB). I have arranged the file as 244 X 3089987 with data delimited with tabs. I would like to load this text-file in Matlab as a matrix. Here is what I have tried and I have been unsuccessful (My Matlab gets hung).

fread = fopen('merge.txt','r');

formatString = repmat('%f',244,3089987);

C = textscan(fread,formatString);

Am I doing something wrong or is my approach wrong? If this is easily possible in Python, could someone please suggest accordingly.

Upvotes: 0

Views: 645

Answers (4)

Edric
Edric

Reputation: 25140

Another option in recent MATLAB releases is to use datastore. This has the advantage of being designed to allow you to page through the data, rather than read the whole lot at once. It can generally deduce all the formatting stuff.

http://www.mathworks.com/help/matlab/import_export/read-and-analyze-data-in-a-tabulartextdatastore.html

Upvotes: 1

nivag
nivag

Reputation: 573

I'm surprised this is even trying to run, when I try something similar textscan throws an error.

If you really want to use textscan you only need the format for each row so you can replace 244 in your code with 1 and it should work. Edit: having read your comment not that in the first element is the number of columns so you should do formatString = repmat('%f',1, 244);. Also you can apparently just leave the format as empty ('') and it will work.

However, Matlab has several text import functions of which textscan is rarely the easiest way to do something.

In this case I would probably use dlmread, which does any delimitated numerical data. You want something like:

C=dlmread('merge.txt', '\t');

Also as you are trying to load 9GB of data I assume you have enough memory, you'll probably get an out of memory error if you don't but it is something to consider.

Upvotes: 0

sco1
sco1

Reputation: 12214

If you read the documentation for textscan you will see that you can define an input argument N so that:

textscan reads file data using the formatSpec N times, where N is a positive integer. To read additional data from the file after N cycles, call textscan again using the original fileID. If you resume a text scan of a file by calling textscan with the same file identifier (fileID), then textscan automatically resumes reading at the point where it terminated the last read.

You can also pass a blank formatSpec to textscan in order to read in an arbitrary number of columns. This is how dlmread, a wrapper for textscan operates.

For example:

fID = fopen('test.txt');
chunksize = 10; % Number of lines to read for each iteration
while ~feof(fID) % Iterate until we reach the end of the file
    datachunk = textscan(fID, '', chunksize, 'Delimiter', '\t', 'CollectOutput', true);
    datachunk = datachunk{1}; % Pull data out of cell array. Can take time for large arrays
    % Do calculations
end
fclose(fID);

This will read in 10 line chunks until you reach the end of the file.

If you have enough RAM to store the data (a 244 x 3089987 array of double is just over 6 gigs) then you can do:

mydata = textscan(fID, '', 'Delimiter', '\t', 'CollectOutput', true);
mydata = mydata{1}; % Pull data out of cell array. Can take time for large arrays

Upvotes: 3

vjerez
vjerez

Reputation: 119

try: A = importdata('merge.txt', '\t');

http://es.mathworks.com/help/matlab/ref/importdata.html

and if the rows are not delimited by '\n': [C, remaining] = vec2mat(A, 244)

http://es.mathworks.com/help/comm/ref/vec2mat.html

Upvotes: 1

Related Questions