Luke Simpson
Luke Simpson

Reputation: 49

Using regex to retrieve specific text

tweets = re.findall(r "'text':+.*'truncated'", tweets)

print (tweets)

'text': "RT @premierleague: 🔵 @WayneRooney's chase is on 👀", 'truncated':

I have a string of text like above and I want to retrieve the tweet which is in between 'text': and 'truncated'.

I have written the above code but receive the error message

 tweets = re.findall(r "'text':+.*'truncated'", tweets)
                                                ^
SyntaxError: invalid syntax

I am using findall as the tweets are repeated and I want to retrieve all the tweets from the findall search.

Thanks.

Upvotes: 1

Views: 70

Answers (1)

pchaigno
pchaigno

Reputation: 13103

The invalid syntax error is due to the white space between r and the regex:

tweets = re.findall(r"'text':+.*'truncated'", tweets)
print(tweets)

returns:

['\'text\': "RT @premierleague: \xf0\x9f\x94\xb5 @WayneRooney\'s chase is on \xf0\x9f\x91\x80", \'truncated\'']

To retrieve only the text:

tweets = re.findall(r"'text':+(.*)'truncated'", tweets)
print(tweets)

returns:

 "RT @premierleague: 🔵 @WayneRooney's chase is on 👀", 

Upvotes: 1

Related Questions