0n10n_
0n10n_

Reputation: 402

Python RegEx to get words after a specific string

having a string

string= """"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy""""

What reguar expression can i use to retrieve values after "name": to get Due Dilligence, Financial, and Finance

i have tried

match = re.compile(r'"name"\:(.\w+)') match.findall(string)

but it returns

['"Finance', '"Financial', '"Due', '"Financial', '"Strategy'] The Due Diligence is split and i want both words as one.

Upvotes: 0

Views: 584

Answers (2)

CharlieH
CharlieH

Reputation: 1542

I would use the non-hungry .*? expression with a trailing quote:

import re

string = """$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,3)","name":"Finance","$type":"voyager.identity.profile.Skill"},{"$deletedFields":["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,22)","name":"Financial ["standardizedSkillUrn","standardizedSkill"],"entityUrn":"urn:li:fs_skill:(ACoAAAIv9SQBMzclPm3CZzL1QceTH5W0VrsdxbE,34)","name":"Due Diligence","name":"Strategy"""

# With the leading double quote
match = re.compile(r'"name"\:(".*?)["\[]')
a = match.findall(string)
print a

# Stripping out the leading double quote
match = re.compile(r'"name"\:"(.*?)["\[]')
b = match.findall(string)
print b

And the final output is:

['"Finance', '"Financial ', '"Due Diligence']
['Finance', 'Financial ', 'Due Diligence']

Upvotes: 0

FefeHern
FefeHern

Reputation: 81

Your whitespace is not detected by regex because /w only searches for non-special characters.

"name"\:(.\w+\s*\w*) accounts for any possible spaces with an extra word (Will not work for three words, but will in your situation)

"name"\:(.\w+\s*\w*"?) accounts for the quotations " at the end of each one but doesn't get Financial. Example

Edit: Fixed second regex for "Financial

Upvotes: 1

Related Questions