Matthew Mishek
Matthew Mishek

Reputation: 1

Regex to find all backslashes and the immediately following character

I am currently working with email data and when extracting from Outlook, the body of the email still keeps all of the escape characters within the string.

I'm using the re package in Python to achieve this, but to no avail.

Here's an example of text I'm trying to rid the escape characters from:

I am completely in agreement with that. \r\n\r\n\rbest regards.

Expected:

I'd like this to read: "I am completely in agreement with that. best regards.

I've tried the following to extract the unwanted text:

re.findall(r'\\\w+', string)
re.findall(r'\\*\w+', string)
re.findall(r'\\[a-z]+', string)

None of these are doing the trick. I'd appreciate any help!

Thanks!

Upvotes: 0

Views: 215

Answers (4)

Billy Bonaros
Billy Bonaros

Reputation: 1721

you can try this:

re.sub(r'\n|\r','', string)


'I am completely in agreement with that. best regards.'

Upvotes: 3

ARD
ARD

Reputation: 333

You can write a function by yourself:

def function(string):
    while '\\' in string:
        ind = string.find('\\')
        string = string[:ind] + string[ind+2:]

    return string

Upvotes: 0

Guillaume Adam
Guillaume Adam

Reputation: 301

It seems you want to get rid of the line returns. If so, you don't need the re module, just use:

string.replace("\r\n", "")

Upvotes: 0

sophros
sophros

Reputation: 16660

You are confusing a representation of whitechars (please read more about them here).

You should rather be looking for \r, \n characters this way:

re.findall(r'\n\w+', string)

or

re.findall(r'\r\w+', string)

Upvotes: 0

Related Questions