Reputation: 409
I'm short on time and specifically wanted to extract a string like the one below. Problem is the tag isn't of the form <a> data </a>
.
Given,
s = <em style="font-size:medium"> 5,888 </em>
how to extract out just 5,888 in matlab?
Upvotes: 4
Views: 4784
Reputation: 409
Thanks folks for your help. I'm basically trying to get the population of a US county on Matlab. Thought I'l share my code, though not the most elegant. Might help some soul. :)
county = 'morris';
state = 'ks';
county = strrep(county, ' ' , '+');
str = sprintf('https://www.google.com/search?&q=population+%s+%s',county,state);
s = urlread(str);
pop = regexp(s,'<em[^>]*>(.*?)</em>', 'tokens');
pop = char(pop{:});
pop = strrep(pop, ',' , '');
pop = str2num(pop);
Upvotes: 3
Reputation: 38032
You will find useful info here, or here, or here, all of which are google-first-page results and would have been faster than asking a question here.
Anyway, quick-'n-dirty way: You can filter on the <>
symbols:
>> s = '<em style="font-size:medium"> 5,888 </em> <sometag> test </sometag>'
>> a = regexp(s, '[<>]');
>> s( cell2mat(arrayfun(@(x,y)x:y, a(2:2:end-1)+1, a(3:2:end)-1, 'uni',false)) )
ans =
5,888 test
Or, slightly more robust and much cleaner, replace everything between any tags (including the tags) with emptyness:
>> s = regexprep(s, '<.*?>', '')
ans =
5,888 test
Upvotes: 3