Reputation: 9
Hello I have file data as specified below
ID=3161
Author=Mark
Context= "eric
speaking
to
mark
about
goldeninfo"
tag = "dramatic"
type = novel
I would like to extract any information represented in quotes. I was able to extract the information specifyed in quotes for tag but I'm not able to get the information for content using the below regex.
I would like to extract value if the specified in quotes else I would like to extract the value not in quotes. Open for suggestions.
quoted = re.compile('"[^"].*"')
if value in quoted.findall(string):
extract it
elif value not in quoted.findall(string):
#extract it
Thanks
Output expected :
Context= "eric speaking to mark about goldeninfo"
tag = "dramatic"
Upvotes: 0
Views: 190
Reputation: 16940
How about this:
>>> match = re.findall('"(.*?)"', string, re.DOTALL)
>>> ' '.join(match[0].split('\n'))
'eric speaking to mark about goldeninfo'
>>>
>>> match[1]
'dramatic'
>>>
Upvotes: 1
Reputation: 82899
Note that your regex means "a character other than "
followed by any number of any character", and not (as I suppose you intended) "any number of characters other than "
"
Also note that [^"]
includes newlines, whereas .
does not.
Instead, try '"[^"]*"'
.
>>> print re.findall('"[^"]*"', string)
['"eric\nspeaking \nto \nmark \nabout \ngoldeninfo"', '"dramatic"']
Upvotes: 0