anarchy
anarchy

Reputation: 5204

Combine 2 string columns in pandas with different conditions in both columns

I have 2 columns in pandas, with data that looks like this.

code fx         category
AXD  AXDG.R     cat1
AXF  AXDG_e.FE  cat1 
333  333.R      cat1
....

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

code    fx         category
AXD.R   AXDG.R     cat1
AXF.FE  AXDG_e.FE  cat1
333.R   333.R      cat1
.....

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.

df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)

I'm not sure how to select the second column also. Any help would be greatly appreciated.

Upvotes: 6

Views: 1040

Answers (5)

anky
anky

Reputation: 75150

There are other categories but I am only interested in cat1

You can use str.split with series.where to add the extention for cat1:

df['code'] = (df['code'].astype(str).add("."+df['fx'].str.split(".").str[-1])
             .where(df['category'].eq("cat1"),df['code']))

print(df)

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

Upvotes: 3

wwnde
wwnde

Reputation: 26686

Replace alphanumerics before the dot. Append the result to column code.

df['code'] +=df['fx'].str.replace('(^[\w]+(?=\.))','',regex=True)




    code         fx  category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

Upvotes: 1

tlentali
tlentali

Reputation: 3455

We can get the expected result using split like so :

>>> df['code'] = df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]
>>> df
    code    fx          category    
0   AXD.R   AXDG.R      cat1        
1   AXF.FE  AXDG_e.FE   cat1        
2   333.R   333.R       cat1    

To filter only on cat1, as @anky did very well, we can add a where statement:

>>> df['code'] = (df['code'] + '.' + df['fx'].str.split(pat=".", expand=True)[1]).where(df['category'].eq("cat1"), df['code'])

Upvotes: 2

user17242583
user17242583

Reputation:

You can use Series.str.extract:

df['code'] = df['code'].astype(str) + np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\..+)')[0], '')

Output:

>>> df
     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

Upvotes: 3

mozway
mozway

Reputation: 262634

You can extract the part of fx and append it to code:

df['code'] += df['fx'].str.extract('(\..*$)')[0]

output:

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

to limit to cat1 only:

df.loc[df['category'].eq('cat1'), 'code'] += df['fx'].str.extract('(\..*$)')[0]

Upvotes: 3

Related Questions