Reputation: 481
I have a series of strings in a cvs file, they all look like the two bellow:
7336598,"[4125420656L, 2428145712L, 1820029797L, 1501679119L, 1980837904L, 380501274L]"
7514340,"[507707719L, 901144614L, 854823005L]"
....
how can I extract the numbers in it? As in.. to retreive 7336598, 4125420656, etc....
Tried textscan
, and regexp
, but not much success...
Sorry for the beginners question...and thank you for having a look! :)
Edit: the size of each line is variable.
Upvotes: 3
Views: 5744
Reputation: 32930
You can use textread
and regexp
to extract only the numbers from your CSV file:
C = textread('file.cvs', '%s', 'delimiter', '\n');
C = regexp(C, '\d+', 'match');
The regular expression is quite simple. In MATLAB's regexp
pattern,\d
denotes a digit, and the +
indicates that this digit must occur at least once. The match
mode tells regexp
to return the matched strings.
The result is a cell array of strings. You can go further and convert the strings to numerical values:
C = cellfun(@(x)str2num(sprintf('%s ', x{:})), C, 'Uniform', false)
The result is still stored in a cell array. If you can guarantee that there's the same amount of numerical values in each row, you can convert the cell array to a matrix:
A = cell2mat(C);
Upvotes: 6
Reputation: 11546
I don't have matlab to test, but does a '[0-9]+' does the job ?
It works for me outside matlab :
echo '7336598,"[4125420656L, 2428145712L, 1820029797L, 1501679119L, 1980837904L, 380501274L]"' | grep -o '[0-9]\+'
7336598
4125420656
2428145712
1820029797
1501679119
1980837904
380501274
Upvotes: 2