Reputation: 29
I'm looking to extract the id tag from the following field of data:
{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}
The regex I'm using breaks when this field is encountered as I'm using '"id":\s*"(.*?)"'
.
Because, only some fields have such extra onhold tag:
{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"All clear 2019 \n ","id":"7462764"}
The whole file is of the form:
{"info":[{"purchased_at":"","product_desc":"","id":""}{..}]}
Upvotes: -2
Views: 41
Reputation: 2313
Just use findall
method in re
module to extract data.
import re
line='{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'
print(re.findall('"id":\s*"(.*?)"',line))
Output
['8745485']
Upvotes: 0
Reputation: 65408
You can import json
library in order to extract the desired value for the key (id
), rather than using a regular expression :
import json
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'
js = json.loads(str)
for i in js:
if i == 'id':
print(js[i])
>>>
8745485
Update : If you need to find out by using methods related with regular expression, then using search
function of re
library with proper pattern might help :
import re
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'
s = re.search('id":"(.+?)"', str)
if s:
print( s.group(1) )
>>>
8745485
Upvotes: 1