user2574276
user2574276

Reputation: 19

Extract specific values from string

I want to extract all words that have the following label "w=". Example, I will need, " THAT HAVE RECEIVED NO" from the string below.

w="THAT" v="22.23092" a="19.09109" i="3"/>
<r s="1480150" d="150" w="HAVE" v="20.66713" a="19.09183" i="3"/>
<r s="1480300" d="360" w="RECEIVED" v="18.70063" a="19.09165" i="2"/>
<r s="1480660" d="200" w="-SIL-" v="11.65527" a="19.09165" i="0"/>
<r s="1480860" d="210" w="NO" v="18.49828" a="19.09137" i="2"/>
<r s="1481070" d="4330" w="-S-" v="11.55029" a="19.09137" i="0"/>
<r s="1485400" d="4170" w="-S-" v="11.88606" a="19.09137" i="0"/>

I have been trying to use the following regex:

 matches = re.findall('(?<=[w][=])\w+',line)

However, it does not seem to work. Please help.

Upvotes: 1

Views: 89

Answers (2)

YXD
YXD

Reputation: 32511

Do you want something more like

re.findall('(w=")([^"]*)(")', line)

?

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250931

Something like this:

>>> import re
>>> re.findall(r'w="(\w+)"',strs,re.DOTALL)
['THAT', 'HAVE', 'RECEIVED', 'NO']

Then use str.join to get a single string:

>>> " ".join(re.findall(r'w="(\w+)"',strs,re.DOTALL))
'THAT HAVE RECEIVED NO'

where strs is :

>>> print strs
w="THAT" v="22.23092" a="19.09109" i="3"/>
<r s="1480150" d="150" w="HAVE" v="20.66713" a="19.09183" i="3"/>
<r s="1480300" d="360" w="RECEIVED" v="18.70063" a="19.09165" i="2"/>
<r s="1480660" d="200" w="-SIL-" v="11.65527" a="19.09165" i="0"/>
<r s="1480860" d="210" w="NO" v="18.49828" a="19.09137" i="2"/>
<r s="1481070" d="4330" w="-S-" v="11.55029" a="19.09137" i="0"/>
<r s="1485400" d="4170" w="-S-" v="11.88606" a="19.09137" i="0"/>

Upvotes: 1

Related Questions