Mozbi
Mozbi

Reputation: 1479

Python regular expressions and tokenization

I have a string "A.B.C one two three."

I have a task to tokenize this string into ["A.B.C", one, two, three], neglecting the period at the end of the sentence. I'm having trouble removing the period at the end of the sentence by itself without interfering with the A.B.C acronym.

Is there a way for me to remove just periods at the end of a sentence without affecting acronyms using python regexs?

Upvotes: 0

Views: 186

Answers (2)

Hugh Bothwell
Hugh Bothwell

Reputation: 56714

word = re.compile(r'[A-Za-z.]*[A-Za-z]')
word.findall("A.B.C one two three.")    # => ['A.B.C', 'one', 'two', 'three']

Upvotes: 2

chk
chk

Reputation: 308

line= "A.B.C one two three."
print line[:-1].split(' ')

may be this way as well

Upvotes: 0

Related Questions