removing relevant hyphens in text

Question

Lets say I have text which looks like:

a = "I am inclin- ed to ask simple questions"

I would like to first extract the hyphenated words, i.e first identify if hyphen is present in the text, this is easy. I use re.match("\s*-\s*", a) for instance to check if the sentence has hyphens.

1) Next I would like to extract the preceding and following partial words (I this case I would like to extract "inclin" and "ed")

2) Next I would like to merge them into "inclined" and print all such words.

I am stuck at step 1. Please help.

alecxe · Accepted Answer

>>> import re
>>> a = "I am inclin- ed to ask simple questions"
>>> result = re.findall('([a-zA-Z]+-)\s+(\w+)', a)
>>> result
[('inclin-', 'ed')]

>>> [first.rstrip('-') + second for first, second in result]
['inclined']

Or, you can make the first group save the word without the trailing -:

>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> result
[('inclin', 'ed')]
>>> [''.join(item) for item in result]
['inclined']

This will also work for multiple matches in the string:

>>> a = "I am inclin- ed to ask simp- le quest- ions"
>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> [''.join(item) for item in result]
['inclined', 'simple', 'questions']

removing relevant hyphens in text

Answers (2)

Related Questions