ron wagner
ron wagner

Reputation: 105

Removing numbers and special characters from multi line statement

With the given input I need to figure out how to get my out put to look as follows. I cannot get rid of the brackets which is my first problem.

X= ["This is a hairy #calf",
    "goodbye to the snow #gone",
    "13742 the digits to give",
    "Remember the name d - mayne",
    "I      hate      you"]

Output =
This is hairy calf
goodbye to the snow gone
the digits to give
Remember the name mayne
I hate you

This is what I've tried:

X= """["This is an hairy #calf",
    "goodbye to the snow #gone",
    "13742 the digits to give",
    "Remember the name d - mayne",
    "I      hate      you"]"""
X_modified=re.sub(r"[#+-\.\"\n""[""]","",X)
X_modified

Upvotes: 2

Views: 100

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may use ast.literal_eval to cast the string you have into a list, that way you simplify further handling the strings. You may run any kind of replacements on the individual string items and then join them with a newline.

A sample snippet:

import ast, re
X= """["This is an hairy #calf",
    "goodbye to the snow #gone",
    "13742 the digits to give",
    "Remember the name d - mayne",
    "I      hate      you"]"""
l = ast.literal_eval(X)
rx_non_word = re.compile(r'[^\w\s]+')
rx_white = re.compile(r'\s{2,}')
print ( "\n".join([rx_white.sub(' ', rx_non_word.sub('', x)) for x in l]) ))

Output:

This is an hairy calf
goodbye to the snow gone
13742 the digits to give
Remember the name d mayne
I hate you

The [^\w\s]+ regex matches 1+ chars other than word and whitespace and \s{2,} matches 2 or more whitespaces.

Note you will have to add any exceptions you have to the regex (if any).

Upvotes: 2

Related Questions