Reputation: 259
I want to clean up the below string but only get rid of the \n
, \r
and extra spaces
but not the apostrophe(')
and other characters like dash(-)
and colon(:)
.
Right now I am using this code but it gets rid of all special characters.
string = "\n\n\r\n Scott Hibb's Amazing Whisky Grilled Baby Back Ribs\r\n \n\n\n\n"
rx = re.compile('\W+')
string = rx.sub(' ', string).strip()
print(string)
How can i do this?
Upvotes: 0
Views: 443
Reputation: 771
The accepted answer is great but if you would like a slightly more general solution that allows you to specify the explicit set of characters that you still want to remove, add a lambda function to the filter, something like this.
>>> y = "\n\n\r\n Scott Hibb's Amazing Whisky Grilled Baby Back Ribs\r\n \n\n\n\n"
>>> ' '.join(filter(lambda x: x not in ['\n', '\r'], y).strip().split())
"Scott Hibb's Amazing Whisky Grilled Baby Back Ribs"
Please note that for your example, explicitly specifying the \n
and \r
in the lambda is overkill because strip() treats \n
and \r
as whitespace but if you wanted to remove other characters, then this a reasonable approach. For example this is how you would strip extra white space characters, remove the \n
and \r
, and remove all standard vowels (a, e, i, o, u).
>>> y = "\n\n\r\n Scott Hibb's Amazing Whisky Grilled Baby Back Ribs\r\n \n\n\n\n"
>>> ' '.join(filter(lambda x: x.lower() not in ['a', 'e', 'i', 'o', 'u', '\r'], y).strip().split())
"Sctt Hbb's mzng Whsky Grlld Bby Bck Rbs"
Upvotes: 1
Reputation: 2233
You can use filter()
and strip()
to remove \n
, \t
, \r
and extra whitespaces
while preserving rest of the characters, something like this :
string = "\n\n\r\n Scott Hibb's Amazing Whisky Grilled Baby Back Ribs\r\n \n\n\n\n"
print(' '.join(filter(None, string.strip().split())))
This will result in :
Scott Hibb's Amazing Whisky Grilled Baby Back Ribs
Upvotes: 2