Reputation: 1114
I have a list of strings. I only want to extract the words within each string that have a specific character sequence.
For example
l1=["grad madd have", "ddim middle left"]
I want all the words that have sequence "dd"
so I would like to get
[["madd"], ["ddim", "middle"]]
I've been trying patterns of the form
[re.findall(r'(\b.*?dd.*\s+)',word) for word in l1]
but have had little success
Upvotes: 0
Views: 439
Reputation: 11
You were close, you'll want to match word characters 0 to many times with \w*
:
[re.findall(r'\w*dd\w*', word) for word in l1]
Upvotes: 1
Reputation: 22817
You can just use list comprehension for this. You don't need regex to accomplish what you're trying to do.
l1=["grad madd have", "ddim middle left"]
print([s for a in l1 for s in a.split() if 'dd' in s])
This loops over l1
and splits each value by the space character. It then tests that substring to see if it contains dd
and returns it if it does.
Upvotes: 1
Reputation:
Try this in one line:
l1=["grad madd have", "ddim middle left"]
print(list(map(lambda x:list(filter(lambda y:'dd' in y,x.split())),l1)))
output:
[['madd'], ['ddim', 'middle']]
Upvotes: 0