Reputation: 402
having a string
string= """"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy""""
What reguar expression can i use to retrieve values after "name": to get Due Dilligence, Financial, and Finance
i have tried
match = re.compile(r'"name"\:(.\w+)')
match.findall(string)
but it returns
['"Finance', '"Financial', '"Due', '"Financial', '"Strategy']
The Due Diligence
is split and i want both words as one.
Upvotes: 0
Views: 584
Reputation: 1542
I would use the non-hungry .*?
expression with a trailing quote:
import re
string = """$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy"""
# With the leading double quote
match = re.compile(r'"name"\:(".*?)["\[]')
a = match.findall(string)
print a
# Stripping out the leading double quote
match = re.compile(r'"name"\:"(.*?)["\[]')
b = match.findall(string)
print b
And the final output is:
['"Finance', '"Financial ', '"Due Diligence']
['Finance', 'Financial ', 'Due Diligence']
Upvotes: 0
Reputation: 81
Your whitespace is not detected by regex because /w
only searches for non-special characters.
"name"\:(.\w+\s*\w*)
accounts for any possible spaces with an extra word (Will not work for three words, but will in your situation)
"name"\:(.\w+\s*\w*"?)
accounts for the quotations "
at the end of each one but doesn't get Financial.
Example
Edit: Fixed second regex for "Financial
Upvotes: 1