Reputation: 161
I'm trying to read in a large file with dlmread
, but it just treats the whole file as one long column. The file is written in Java with the following code:
public void writeToFile(double[] arr) throws IOException{
FileWriter write = new FileWriter(path, append);
PrintWriter print_line = new PrintWriter(write);
for(int i=0; i<arr.length; i++){
print_line.printf("%f\t", arr[i]);
}
print_line.printf("\n");
print_line.close();
}
and my MATLAB script reads in the file like: [DATA] = dlmread('probability_cyclelength.dat');
giving:
>>size(DATA)
ans =
2000000 1
There are 2000000 data in total, with up to 60,000 per row (but not the same number in each row - shouldn't matter)
When I try it with a smaller dataset (100000 data), it works absolutely fine. I don't know if the problem is in the Java or the MATLAB, so I really need some help, thanks!
Upvotes: 4
Views: 3387
Reputation: 20319
By default dlmread
tries to infer the delimiter from the file, by default it uses a white space as a delimiter.
The only way I was able to replicate the problem you describe was by specifying ' '
as the delimiter. Are you sure you aren't doing this?
Try making this change and see if it fixes your problem.
data = dlmread(inFile, '\t');
If that doesn't fix your problem then I suspect that the problem is arising from the fact that the rows in your text file have a different number of columns. For example if you use dlmread
to open a text file containing:
1 2 3 4
5
dlmread
returns a matrix like this:
1 2 3 4
5 0 0 0
This representation is wasteful as it is using 64 bytes (8 bytes per double * 8 double) to store 40 bytes of information.
It could be that with these empty positions that a matrix representation of your file is simply too big, and so dlmread
is returning your a vector instead to save memory.
You can work around this though. If you only need a few rows at a time you can load a collection of rows from the file by specifying a range
to dlmread
. Note for this to work you have to know the maximum number of columns in the file, as dlmread
won't let you read more than that number of columns.
r = [0 4]; %load the first 5 rows
maxC = 10; % load up to 10 columns
data = dlmread(inFile, '\t', [r(1), 0, r(2), maxX]);
You could then loop through the file loading the rows of interest, but you probably can't load them all into a matrix due to the memory constraints I mentioned earlier.
If you need the entire dataset in memory then you should consider loading each row individually and saving them into a cell array. It takes a bit more work to get everything loaded but you could do that with something like this:
% open the file
fid = fopen(fileName);
% load each line as a single string
tmp = textscan(fid, '%s', 'delimiter', '\n');
% textscan wraps its results in a cell, remove that wrapping
rawText = tmp{1};
nLines = numel(rawText);
%create a cell array to store the processed string
data = cell(nLines, 1);
for i = 1:nLines
%scan a line of text returning a vector of doubles
tmp = textscan(rawText{i}, '%f');
data{i} = tmp{1};
end
Upvotes: 6