user3232688
user3232688

Reputation: 27

how to not remove apostrophe only for some words in text file in python

In a sentence, How can I remove apostrophe, double quotes, comma and so on for all words excluding words like it's, what's etc.. and at end of the sentence there must be a space between word and full stop.

For example

Input Sentence :

"'This has punctuation, and it's hard to remove. ?"    

Desired Output Sentence :

This has punctuation and it's hard to remove .

Upvotes: 0

Views: 3361

Answers (5)

Jerry
Jerry

Reputation: 71568

I propose this code:

import re

sentences = [""""'This has punctuation, and it's hard to remove. ?" """,
             "Did you see Cress' haircut?.",
             "This 'thing' hasn't a really bad habit, you know?.",
             "'I bought this for $30 from Best Buy it's. What a waste of money! The ear gels are 'comfortable at first, but what's after an hour."]

for s in sentences:
    # Remove the specified characters
    new_s = re.sub(r"""["?,$!]|'(?!(?<! ')[ts])""", "", s)

    # Deal with the final dot
    new_s = re.sub(r"\.", " .", new_s)
    print(new_s)

ideone demo

Output:

This has punctuation and it's hard to remove .
Did you see Cress haircut .
This thing hasn't a really bad habit you know .
I bought this for 30 from Best Buy it's . What a waste of money The ear gels are comfortable at first but what's after an hour .

The regex:

["?,$!]     # Match " ? , $ or !
|           # OR
'           # A ' if it does not have...
(?!        
  (?<! ')  
  [ts]      # t or s after it, provided it has no ` '` before the t or s
)

Upvotes: 1

PhilDenfer
PhilDenfer

Reputation: 270

Use the string.strip(delimiter) function for the outside quotes

like this :

output = chaine.strip("\"")

Be careful, you have to escape some characters with a '\' like ', ", \, and so on. Or you can enter them as "'", '"' (unsure).

Edit : mmh, didn't think about the apostrophes, if the only problem is the apostrophes you can strip the rest first then parse it manually with a for statement, place indice of first apostrophe found then if followed by an 's', leave it, I don't know, you have to set lexical/semantical rules before coding it.

Edit 2 : If the string is only a sentence, and always has a dot at the end, and always needs the space, then use this at the end :

chaine[:-2]+" "+chaine[-2:]

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174756

Use a negative look-behind

(?<!\w)["'?]|,(?= )

REmove the matched '"? characters through re.sub.

DEMO

And your code would be,

>>> s = '\"\'This has punctuation, and it\'s hard to remove. ?\" '
>>> m = re.sub(r'(?<!\w)[\"\'\?]|,(?= )', r'', s)
>>> m
"This has punctuation and it's hard to remove.  "

Upvotes: 2

Abhijit
Abhijit

Reputation: 63757

My take for this is, remove all quotations which are at either end of a word. So split the sentences to word (separated by white-space) and strip any leading or trailing quotation marks from the words

>>> ''.join(e.strip(string.punctuation) for e in re.split("(\s)",st))
"This has punctuation and it's hard to remove   "

Upvotes: 0

zx81
zx81

Reputation: 41838

Use this:

(?<![tT](?=.[sS]))["'?:;,.]

If you also want to leave the period at the end of a line (as long as it is preceded by a space):

(?<![tT](?=.[sS]))(?<! (?=.$))["'?:;,.]

Upvotes: 0

Related Questions