Replace values in one column for specific instances of another column

Question

I am new to Pandas and not sure how to do the following:

I have a dataframe (df) with several columns. One column is called

OldCat = ['a-nn', 'bb-nm', 'ab-pp', 'ba-nn', 'cc-nm', 'ca-mn']

Now I want to create a new column that organizes/categories OldCat in a new way (NewCat). Specifically, I would like to look for strings in OldCat that contain either a(at the beginning), ba or ca j='^a|ba|ca' and then replace nan in NewCat with 'A' in the rows where OldCat contains j. In R this would be something like this:

 j='^a|ba|ca'
 df[(OldCat %like% j),NewCat := str_replace_all(df[(OldCat %like% j),NewCat], "nan", "A")]

Since R is relatively slow with my very large dataset, I decided to switch to pandas but I am not sure how to get the same result. Cheers

EdChum · Accepted Answer

You can use the vectorised str.extract to return matches with fillna to replace NaN with the string 'nan':

In [119]:
df['NewCat'] = df['OldCat'].str.extract('(^a|ba|ca)', expand=False).fillna('nan')
df

Out[119]:
  OldCat NewCat
0      a      a
1     bb    nan
2     ab      a
3     ba     ba
4     cc    nan
5     ca     ca

EDIT

IIUC then you can use str.contains with loc to set just the rows that contain the strings:

In [137]:
df.loc[df['OldCat'].str.contains(r'a|ba|ca'), 'NewCat'] = 'A'
df['NewCat'].fillna('nan', inplace=True)
df

Out[137]:
  OldCat NewCat
0   a-nn      A
1  bb-nm    nan
2  ab-pp      A
3  ba-nn      A
4  cc-nm    nan
5  ca-mn      A

Replace values in one column for specific instances of another column

Answers (1)

Related Questions