Jamie Bull
Jamie Bull

Reputation: 13519

Regex in pandas to find a match based on string in another column

I have a dataframe of which this is a part.

   CodeID    Codes
0  'code1'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
1  'code2'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
2  'code3'   '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'   ...
...

What I'm trying to do is extract the part of the string in column Codes that matches the pattern r"\[<code in CodeID column>[^][]*\]"

Something like:

df['Code'] = df['Codes'].str.find(r"\[<code in CodeID column>[^][]*\]")

This recent question seems to imply it's not possible in a vectorised way but it's not exactly the same situation.

Upvotes: 2

Views: 2089

Answers (1)

WoodChopper
WoodChopper

Reputation: 4375

We can certainly use string from one column to compare another like below,

In lambda expression x[0] is codeID and x[1] is codes.

import re
import pandas as pd

Out[20]: 
    CodeID                                         Codes
0  'code1'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
1  'code2'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
2  'code3'  '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'

df[['CodeID','Codes']].apply(lambda x: re.match(r"\[%s[^][]*\]"%x[0], x[1]),axis=1)
Out[21]: 
0    None
1    None
2    None
dtype: object

Well it returns None because of my bad regex skills :)

Upvotes: 2

Related Questions