suzee
suzee

Reputation: 603

removing relevant hyphens in text

Lets say I have text which looks like:

a = "I am inclin- ed to ask simple questions"

I would like to first extract the hyphenated words, i.e first identify if hyphen is present in the text, this is easy. I use re.match("\s*-\s*", a) for instance to check if the sentence has hyphens.

1) Next I would like to extract the preceding and following partial words (I this case I would like to extract "inclin" and "ed")

2) Next I would like to merge them into "inclined" and print all such words.

I am stuck at step 1. Please help.

Upvotes: 1

Views: 554

Answers (2)

MrHaze
MrHaze

Reputation: 3996

Try ou thtis regex, it should work well for you:

a = "I am inclin- ed to ask simple questions"

try:
    m = re.search('\S*\-(.|\s)\S*', a) #this will get the whole word, i.e "inclin- ed"
except AttributeError:
    #not found in a

print m

then you strip your string, and grab them as an array.

Upvotes: 1

alecxe
alecxe

Reputation: 473873

>>> import re
>>> a = "I am inclin- ed to ask simple questions"
>>> result = re.findall('([a-zA-Z]+-)\s+(\w+)', a)
>>> result
[('inclin-', 'ed')]

>>> [first.rstrip('-') + second for first, second in result]
['inclined']

Or, you can make the first group save the word without the trailing -:

>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> result
[('inclin', 'ed')]
>>> [''.join(item) for item in result]
['inclined']

This will also work for multiple matches in the string:

>>> a = "I am inclin- ed to ask simp- le quest- ions"
>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> [''.join(item) for item in result]
['inclined', 'simple', 'questions']

Upvotes: 2

Related Questions