alexandreC
alexandreC

Reputation: 481

matlab - extracting numbers from (odd) string

I have a series of strings in a cvs file, they all look like the two bellow:

7336598,"[4125420656L, 2428145712L, 1820029797L, 1501679119L, 1980837904L, 380501274L]"
7514340,"[507707719L, 901144614L, 854823005L]"
....

how can I extract the numbers in it? As in.. to retreive 7336598, 4125420656, etc....

Tried textscan, and regexp, but not much success...

Sorry for the beginners question...and thank you for having a look! :)

Edit: the size of each line is variable.

Upvotes: 3

Views: 5744

Answers (2)

Eitan T
Eitan T

Reputation: 32930

You can use textread and regexp to extract only the numbers from your CSV file:

C = textread('file.cvs', '%s', 'delimiter', '\n');
C = regexp(C, '\d+', 'match'); 

The regular expression is quite simple. In MATLAB's regexp pattern,\d denotes a digit, and the + indicates that this digit must occur at least once. The match mode tells regexp to return the matched strings.

The result is a cell array of strings. You can go further and convert the strings to numerical values:

C = cellfun(@(x)str2num(sprintf('%s ', x{:})), C, 'Uniform', false)

The result is still stored in a cell array. If you can guarantee that there's the same amount of numerical values in each row, you can convert the cell array to a matrix:

A = cell2mat(C);

Upvotes: 6

Julien Palard
Julien Palard

Reputation: 11546

I don't have matlab to test, but does a '[0-9]+' does the job ?

It works for me outside matlab :

echo '7336598,"[4125420656L, 2428145712L, 1820029797L, 1501679119L, 1980837904L, 380501274L]"' | grep -o '[0-9]\+'
7336598
4125420656
2428145712
1820029797
1501679119
1980837904
380501274

Upvotes: 2

Related Questions