Reputation: 13519
I have a dataframe of which this is a part.
CodeID Codes
0 'code1' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]' ...
1 'code2' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]' ...
2 'code3' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]' ...
...
What I'm trying to do is extract the part of the string in column Codes
that matches the pattern r"\[<code in CodeID column>[^][]*\]"
Something like:
df['Code'] = df['Codes'].str.find(r"\[<code in CodeID column>[^][]*\]")
This recent question seems to imply it's not possible in a vectorised way but it's not exactly the same situation.
Upvotes: 2
Views: 2089
Reputation: 4375
We can certainly use string from one column to compare another like below,
In lambda expression x[0] is codeID and x[1] is codes.
import re
import pandas as pd
Out[20]:
CodeID Codes
0 'code1' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
1 'code2' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
2 'code3' '[code1(a,b,c)][code2(c,d,e)][code3(e,f,g)]'
df[['CodeID','Codes']].apply(lambda x: re.match(r"\[%s[^][]*\]"%x[0], x[1]),axis=1)
Out[21]:
0 None
1 None
2 None
dtype: object
Well it returns None because of my bad regex skills :)
Upvotes: 2