Reputation: 105
With the given input I need to figure out how to get my out put to look as follows. I cannot get rid of the brackets which is my first problem.
X= ["This is a hairy #calf",
"goodbye to the snow #gone",
"13742 the digits to give",
"Remember the name d - mayne",
"I hate you"]
Output =
This is hairy calf
goodbye to the snow gone
the digits to give
Remember the name mayne
I hate you
This is what I've tried:
X= """["This is an hairy #calf",
"goodbye to the snow #gone",
"13742 the digits to give",
"Remember the name d - mayne",
"I hate you"]"""
X_modified=re.sub(r"[#+-\.\"\n""[""]","",X)
X_modified
Upvotes: 2
Views: 100
Reputation: 626929
You may use ast.literal_eval
to cast the string you have into a list, that way you simplify further handling the strings. You may run any kind of replacements on the individual string items and then join them with a newline.
import ast, re
X= """["This is an hairy #calf",
"goodbye to the snow #gone",
"13742 the digits to give",
"Remember the name d - mayne",
"I hate you"]"""
l = ast.literal_eval(X)
rx_non_word = re.compile(r'[^\w\s]+')
rx_white = re.compile(r'\s{2,}')
print ( "\n".join([rx_white.sub(' ', rx_non_word.sub('', x)) for x in l]) ))
Output:
This is an hairy calf
goodbye to the snow gone
13742 the digits to give
Remember the name d mayne
I hate you
The [^\w\s]+
regex matches 1+ chars other than word and whitespace and \s{2,}
matches 2 or more whitespaces.
Note you will have to add any exceptions you have to the regex (if any).
Upvotes: 2