Reputation: 39
New to Python here, and trying to get the hang of regular expressions.
I'm trying to remove backslashes from inside a string. It's part of a function that pulls comments from Reddit, cleans them up, and makes them into one long string (or, at least that's my aim). When I run the function, the text comes through with an additional backslash where there was an apostrophe in the original text, e.g. " It\'s been a few years "
I know there are other posts on the topic, and I've tried the resulting recommendations, .replace("\", "") and .replace("\\", ""). No luck. Also no luck with .decode.
I'm clearly missing something. Any ideas?
PS — Unrelated, but is it possible to gang up the .sub clauses in the way you can with the .replace ones, rather than have each one on a new line?
Thanks in advance!
list_reddit = []
subreddit = reddit.subreddit('politics')
hot_python = subreddit.hot()
hot_python = subreddit.hot(limit=1)
for submission in hot_python:
comments = submission.comments
for comment in comments:
reddit_text = comment.body
nospaces = reddit_text.replace('\n',' ').replace(''', ' ')
formatone = re.sub(r"http\S+", ' ', nospaces)
formattwo = re.sub(r"https\S+", ' ', formatone)
list_reddit.append(formattwo)
onestring = ' '.join(list_reddit)
Upvotes: 3
Views: 4232
Reputation: 12456
You should use the replace in simple quotes:
string.replace('\\','')
Good luck!
Upvotes: 1