elrich bachman
elrich bachman

Reputation: 176

how do i extract value inside quotes using regex python?

My text is

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'

I am trying to extract value of posted_data which is 2e54eba66f8f2881c8e78be8342428xd

My code :

extract_posted_data = re.search(r'(\"posted_data\": \")(\w*)', my_text)
print (extract_posted_data)

and it prints None

Thank you

Upvotes: 2

Views: 94

Answers (4)

llllllllll
llllllllll

Reputation: 16434

This is because your original code has an additional space. It should be:

extract_posted_data = re.search(r'(\"posted_data\":\")(\w*)', my_text)

And in fact, '\' is unnecessary here. Just:

extract_posted_data = re.search(r'("posted_data":")(\w*)', my_text)

Then:

extract_posted_data.group(2)

is what you want.

>>> my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
>>> extract_posted_data = re.search(r'("posted_data":")(\w*)', my_text)   
>>> extract_posted_data.group(2)
'2e54eba66f8f2881c8e78be8342428xd'

Upvotes: 1

Totoro
Totoro

Reputation: 887

as others have mentioned json would be a better tool for this data but you can also use this regex (I added a \s* in case in the future there are spaces in between):

regex: "posted_data":\s*"(?P<posted_data>[^"]+)"

import re

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
m = re.search(r'"posted_data":\s*"(?P<posted_data>[^"]+)"', my_text)
if m:
    print(m.group('posted_data'))

Upvotes: 1

G_M
G_M

Reputation: 3382

This particular example doesn't seem like it needs regular expressions at all.

>>> my_text
'"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
>>> import json
>>> result = json.loads('{%s}' % my_text)
>>> result
{'posted_data': '2e54eba66f8f2881c8e78be8342428xd', 'isropa': False, 'rx': 'NO', 'readal': 'false'}
>>> result['posted_data']
'2e54eba66f8f2881c8e78be8342428xd'

With BeautifulSoup:

>>> import json
... 
... from bs4 import BeautifulSoup
... 
... soup = BeautifulSoup('<script type="text/javascript"> "posted_data":"2738273283723hjasda" </script>')
... 
... result = json.loads('{%s}' % soup.script.text)
>>> result
{'posted_data': '2738273283723hjasda'}
>>> result['posted_data']
'2738273283723hjasda'

Upvotes: 3

DBedrenko
DBedrenko

Reputation: 5039

You need to change your regex to use lookarounds, as follows:

my_text = '"posted_data":"2e54eba66f8f2881c8e78be8342428xd","isropa":false,"rx":"NO","readal":"false"'
extract_posted_data = re.search(r'(?<="posted_data":")\w*(?=")', my_text)
print (extract_posted_data[0])

Prints 2e54eba66f8f2881c8e78be8342428xd

Also re.search() returns a Match object, so to get the first match (the only match) you get index 0 of the match:

Upvotes: 1

Related Questions