Reputation: 365
I have a textfile with the following structure:
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...
The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).
I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.
In case it's important, I'm doing this on a Mac.
Upvotes: 4
Views: 14307
Reputation:
This has regular expression checking to make sure your data is formatted well.
fid = fopen('data.txt','rt'); %these will be your 8 value arrays val1 = []; val2 = []; val3 = []; val4 = []; val5 = []; val6 = []; val7 = []; val8 = []; linenum = 0; % line number in file valnum = 0; % number of value (1-8) while 1 line = fgetl(fid); linenum = linenum+1; if valnum == 8 valnum = 1; else valnum = valnum+1; end %-- if reached end of file, end if isempty(line) | line == -1 fclose(fid); break; end switch valnum case 1 pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04) case 2 pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00) [valid up to 1billion-1] case 3 pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00) [valid up to 1billion-1] case 4 pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50) [valid up to 1billion-1] case 5 pat = '(?\d+)'; % val5 (e.g. 0) case 6 pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225) [valid up to 1billion-1] case 7 pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605) [valid up to 1billion-1] case 8 pat = '(?\d+)'; % val8 (e.g. 37) otherwise error('bad linenum') end l = regexp(line,pat,'names'); % l is for line if length(l) == 1 % match if valnum == 1 serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date val1 = [val1;serialtime]; else this_val = strrep(l.val,',',''); % strip out comma and convert to number eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array end else warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]); end end
Upvotes: 0
Reputation: 125854
EDIT: This is a shorter version of the code I previously had in my answer...
If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:
fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(@str2double,[data{2:end}])]';
The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.
Upvotes: 9
Reputation: 124563
I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):
cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' |
sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv
Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.
The output will look like this:
1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14
Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:
fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
M = [datenum(a{1}) a{2}]
and the resulting matrix M is:
730124 1100 1060 1092.5 0 6225 1336605 37
730125 1122.5 1087.5 1122.5 0 3250 712175 14
Upvotes: 4
Reputation: 2401
Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.
function vectors = readdata(filename)
fid=fopen(filename);
tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
disp(tline)
if counter > 0
vectors{counter} = [vectors{counter} str2double(tline)];
end
counter = counter + 1
if counter > 7
counter = 0;
end
tline = fgetl(fid);
end
fclose(fid);
Upvotes: 0
Reputation: 121077
It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.
% Read file as text
text = fileread('c:/data.txt');
% Split by line
x = regexp(text, '\n', 'split');
% Remove commas from numbers
x = regexprep(x, ',', '')
% Number of items per object
n = 8;
% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));
% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';
% Combine dates and numbers
thedata = [dates nums];
You could also look into the function textscan
for alternative ways of solving the problem.
Upvotes: 2
Reputation: 1697
Use a script to modify your text file into something that Matlab can read.
eg. make it a matrix:
M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605; <-- notice the ';'
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250; <-- notice the ';'
712,175
14
...
]
import this into matlab and read the various vectors from the matrix.
Note: my matlab is a bit rusty. Might containt errors.
Upvotes: 3