deepakkumar
deepakkumar

Reputation: 147

Replacing regex match in pandas column with modified regex

I am trying to replace a regular expression match with modified regular expression. Following is the column in my DataFrame.

    df['newcolumn']
    0    Ther was a quick brown appl_product_type in ("eds") where blah blan appl_Cust_type =("value","value")
    1    Ther was a quick brown appl_product_type = ("EDS") where blah blan appl_Cust_type =("value","value") 
    2    Ther was a quick brown appl_product_type in ("eds") where blah b                                     
    3    Ther was a quick brown appl_product_type in = ("EDS") where blah blan appl_Cust_type = ("value")     
    4    Ther was a quick brown  where blah blan appl_Cust_type                                               
    Name: newcolumn, dtype: object

i want to replace every occurrence of strings like "appl_product_type = ('EDS')' to 'upper(appl_product_type) = ('EDS')'

i am using following code but getting error

    newcolumn.replace(value='upper\[\w]+\s+[in=]+[\s+\([\"\w+\,+\s+]+\)', regex='[\w]+\s+[in=]+[\s+\([\"\w+\,+\s+]+\)')
    error: bad escape \w at position 7

is there a way to solve this ?? Please Help.

Upvotes: 0

Views: 269

Answers (1)

Karan Shishoo
Karan Shishoo

Reputation: 2802

A couple of things -

  • you cant use \w in your replacement value and expect it to know what to fill in
  • your regex as is, is badly formatted. use r'' to make simpler regex strings
  • your question is unclear as you are asking one specific format while your regex is attempting to catch a lot more.

I have a slightly more clear solution to what you have attempted, but am unsure if this is exactly what you wanted given the ambiguity in you question. -

df['newcolumn'] = df['newcolumn'].replace({r'([\w_]+\s+(?:in|=|\s)+\(\"(?:\w+\"(?:\,)?(?:\s+)?)+\))' : r'upper(\1)'}, regex=True)

Upvotes: 1

Related Questions