Reputation: 104
Here I want to extract the review/text.But its extracting only few parts from it. Following are the outputs:- <re.Match object; span=(226, 258), match='review/text: I like Creme Brulee'> <re.Match object; span=(750, 860), match='review/text: not what I was expecting in terms of>
import re
text='''
'product/productId: B004K2IHUO\n',
'review/userId: A2O9G2521O626G\n',
'review/profileName: Rachel Westendorf\n',
'review/helpfulness: 0/0\n',
'review/score: 5.0\n',
'review/time: 1308700800\n',
'review/summary: The best\n',
'review/text: I like Creme Brulee. I loved that these were so easy. Just sprinkle on the sugar that came with and broil. They look amazing and taste great. My guess thought I really went out of the way for them when really it took all of 5 minutes. I will be ordering more!\n',
'\n',
'product/productId: B004K2IHUO\n',
'review/userId: A1ZKFQLHFZAEH9\n',
'review/profileName: S. J. Monson "world citizen"\n',
'review/helpfulness: 2/8\n',
'review/score: 3.0\n',
'review/time: 1236384000\n',
'review/summary: disappointing\n',
"review/text: not what I was expecting in terms of the company's reputation for excellent home delivery products\n",
'\n',
'''
pattern=re.compile(r'review/text:\s[^.]+')
matches=pattern.finditer(text)
for match in matches:
print(match)
Upvotes: 1
Views: 324
Reputation: 18611
Use
matches = re.findall(r'review/text:.+', text)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
review/text: 'review/text:'
--------------------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
Upvotes: 0
Reputation: 670
If you don't mind not using re
and if the identifier is 'review/text'
and your data is always comma seperated, you can get the lines simply with:
matches = [s.strip() for s in text.split(',') if s.strip(' "\n\'').startswith('review/text')]
for match in matches:
print(match)
where s.strip(' "\'\n')
removes spaces, "
, '
, and newline characters from the beginning and ends of the line for a string comparison.These two lines are returned:
'review/text: I like Creme Brulee. I loved that these were so easy. Just sprinkle on the sugar that came with and broil. They look amazing and taste great. My guess thought I really went out of the way for them when really it took all of 5 minutes. I will be ordering more!
'
"review/text: not what I was expecting in terms of the company's reputation for excellent home delivery products
"
Upvotes: 1