Reputation: 1271
Is there a way to perform conditional text import within MATLAB? e.g. with a tab-delimited .txt file in this format:
Type A B C D E
A 5000 2 5 16 19
A 5000 3 4 5 4
A 5000 4 1 4 5
B 500 19 8 2 7
B 500 18 9 8 1
B 500 2 9 13 2
B 100 3 10 15 9
B 5000 4 15 14 10
Is there a method to import only those lines where Column A contains '5000'?
This is preferential over importing the entire .txt file and separating the data afterward as in reality, my text files are rather large (~200MB each) - but if there is a way to do this quickly, that would also be a suitable solution.
Alternatively, is there a method (similar to R) where you can import and handle data using the headers contained in the .txt file? e.g. importing 'Type' 'A' 'B' and 'D' whilst ignoring 'C' and 'E' in the above example. This is needed if the input file is flexible in format with additional columns added sometimes meaning their relative positions change.
Upvotes: 3
Views: 378
Reputation: 47402
The function fgetl
is used to read a single line from a text file, so one option would be to write a loop which continually reads a single line using fgetl
and checks if the first column contains "5000" before deciding whether to include it in your data set or not.
This is the solution presented in il_raffa's answer. Notice that you actually have to read the entire file anyway, since you read the entire line with fgetl
and then use textscan
on it! So it certainly won't be any faster than reading the entire file and then filtering it (though it may be more memory-efficient).
Really what you want is to read the file character by character, aborting each line if you can determine that you won't be reading it, based on the value of the "A" column.
If you were writing C or another low-level language this would probably be faster than importing the entire file and filtering it afterward. However, because of the overhead introduced by MATLAB it will almost certainly be faster and easier to read the entire file and filter it later. The textscan
function is pretty good (and speedy) at reading delimited files, and 200MB is really not that large (it fits comfortably into memory on any modern computer, for example). You should just make sure to filter each data set after reading it, rather than reading all data sets and then filtering them all.
To the second part of your question, regarding whether you can selectively import columns - MATLAB doesn't provide a built-in way to do this. However, it isn't that tricky, if you can make a few assumptions about your file format. If we assume that
Then you can read the header line (using fgetl
) which will tell you how many columns there are, and what their names are. You can then use that information to build a call to textscan
which will read the delimited columns, and filter out the ones whose headers don't match what you need. A simple version of this might look like -
function columns = import_columns(filename, headers)
fid = fopen(filename);
hdr = fgetl(fid);
column_headers = regexp(hdr, '\t', 'split'); % split on tabs
num_cols = length(column_headers);
format_str = repmat('%s', 1, num_cols); % create a string like '%s%s%s%s'
columns = textscan(fid, format_str, 'Delimiter', '\t');
fclose(fid);
required_cols = ismember(column_headers, headers);
columns(~required_cols) = []; % remove the columns you don't need
end
Upvotes: 0
Reputation: 5190
You might try reading the input file line by line, check if the line contains the reference value (5000 in this case) in the reference column (column 2 in this case).
If so you can store the input, otherwise, you discard it.
In the following code, based on your template, you can define the reference value and the reference column at the beginning of the code.
You can then convert cellarray
output to array
% Define the column index
col_idx=2
% Define the reference value
ref_value=5000
% Open input file
fid=fopen('in.txt');
% Read header
tline = fgetl(fid);
% Initialize conter
cnt=0;
% Initialize output variable
data=[];
% Read the file line by line
while 1
% Read the line
tline = fgetl(fid);
% Check for the end of file
if ~ischar(tline)
break
end
% Get the line field
c=textscan(tline,'%c%f%f%f%f%f')
% If the seconf field contains the ref value, then store the inout data
if(c{col_idx} == ref_value)
data=[data;c]
end
end
fclose(fid);
% Convert cell 2 array
c=data(:,2:end)
num_data=cell2mat(c)
% Convert first column to char
lab=char(data(:,1))
Hope this helps.
Upvotes: 1