Reputation: 1479

Python regular expressions and tokenization

I have a string "A.B.C one two three."

I have a task to tokenize this string into ["A.B.C", one, two, three], neglecting the period at the end of the sentence. I'm having trouble removing the period at the end of the sentence by itself without interfering with the A.B.C acronym.

Is there a way for me to remove just periods at the end of a sentence without affecting acronyms using python regexs?

Upvotes: 0

Answers (2)

Hugh Bothwell

Reputation: 56714

word = re.compile(r'[A-Za-z.]*[A-Za-z]')
word.findall("A.B.C one two three.")    # => ['A.B.C', 'one', 'two', 'three']

Upvotes: 2

chk

Reputation: 308

line= "A.B.C one two three."
print line[:-1].split(' ')

may be this way as well

Upvotes: 0

Python regular expressions and tokenization

Answers (2)

Related Questions