Reputation: 1114

extract words with specific character sequence

I have a list of strings. I only want to extract the words within each string that have a specific character sequence.

For example

l1=["grad madd have", "ddim middle left"]

I want all the words that have sequence "dd"

so I would like to get

[["madd"], ["ddim", "middle"]]

I've been trying patterns of the form

[re.findall(r'(\b.*?dd.*\s+)',word) for word in l1]

but have had little success

Upvotes: 0

Answers (4)

Reputation: 11

You were close, you'll want to match word characters 0 to many times with \w*:

[re.findall(r'\w*dd\w*', word) for word in l1]

Upvotes: 1

Reputation: 22817

You can just use list comprehension for this. You don't need regex to accomplish what you're trying to do.

l1=["grad madd have", "ddim middle left"]
print([s for a in l1 for s in a.split() if 'dd' in s])

This loops over l1 and splits each value by the space character. It then tests that substring to see if it contains dd and returns it if it does.

Upvotes: 1

Reputation:

Try this in one line:

l1=["grad madd have", "ddim middle left"]

print(list(map(lambda x:list(filter(lambda y:'dd' in y,x.split())),l1)))

output:

[['madd'], ['ddim', 'middle']]

Upvotes: 0

Reputation: 12880

You can try with this Regex : \b\w*dd\w*\b

Upvotes: 0