Abdulquadir Shaikh
Abdulquadir Shaikh

Reputation: 13

Regex Search and Replace in Python

I am looking to do a Regex conditional search.

What I am looking to do is if there is Carriage Return (\r) followed by Upper and Lower Case alphabets the I want to add space ('') and remove carriage return but if after carriage there is anything else I just want to replace that. Is there a way I can do that using regex in Python

Sample Input:

BCP-\rEngin\reerin\rg\rSyste\rms\rSupp\rort

Output:

BCP- Engineering Systems Support

Data is in form of dataframe. I am currently using df.replace() function to replace "\r" with spaces (" ") but I would like it to be conditional.

Below is my code -

df_replace = df.replace(to_replace=r"\r", value = " ", regex=True)

Upvotes: 0

Views: 213

Answers (2)

Brigadeiro
Brigadeiro

Reputation: 2945

I am not familiar with python, but the regex you will need is as follows (perhaps someone with python experience can edit to customize this code):

This will find all \r that precede an uppercase letter, so replace this with an empty string:

\\r(?![A-Z])

This will find all \r that precede a lowercase letter, so replace this with a space:

\\r(?![a-z])

EDIT

Okay, here's one solution in Python I was able to put together for you:

import re

myString = "BCP-\rEngin\reerin\rg\rSyste\rms\rSupp\rort"

myString = re.sub("\\r(?![A-Z])", "", myString)
myString = myString.replace("\r", " ")  # This can be simple string replace

Upvotes: 2

Abdulquadir Shaikh
Abdulquadir Shaikh

Reputation: 13

I was able to get the solution for this -

df_replace2 =  df.replace(to_replace = r"(\r)(?![A-Z])", value = "", regex=True)
df_replace3 = df_replace2.replace(to_replace = r"(\r)(?![a-z])", value = " ", regex=True)

Thanks @Brigadeiro for guiding with the solution

Upvotes: 0

Related Questions