Reputation: 603
Lets say I have text which looks like:
a = "I am inclin- ed to ask simple questions"
I would like to first extract the hyphenated words, i.e first identify if hyphen is present in the text, this is easy. I use re.match("\s*-\s*", a) for instance to check if the sentence has hyphens.
1) Next I would like to extract the preceding and following partial words (I this case I would like to extract "inclin" and "ed")
2) Next I would like to merge them into "inclined" and print all such words.
I am stuck at step 1. Please help.
Upvotes: 1
Views: 554
Reputation: 3996
Try ou thtis regex, it should work well for you:
a = "I am inclin- ed to ask simple questions"
try:
m = re.search('\S*\-(.|\s)\S*', a) #this will get the whole word, i.e "inclin- ed"
except AttributeError:
#not found in a
print m
then you strip your string, and grab them as an array.
Upvotes: 1
Reputation: 473873
>>> import re
>>> a = "I am inclin- ed to ask simple questions"
>>> result = re.findall('([a-zA-Z]+-)\s+(\w+)', a)
>>> result
[('inclin-', 'ed')]
>>> [first.rstrip('-') + second for first, second in result]
['inclined']
Or, you can make the first group save the word without the trailing -
:
>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> result
[('inclin', 'ed')]
>>> [''.join(item) for item in result]
['inclined']
This will also work for multiple matches in the string:
>>> a = "I am inclin- ed to ask simp- le quest- ions"
>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> [''.join(item) for item in result]
['inclined', 'simple', 'questions']
Upvotes: 2