katsumi
katsumi

Reputation: 154

Operate on columns based on other column contents in pandas

Coming from R, I cannot figure out how to make kinda vectorized operations on one dataframe column by utilizing other columns, e.g.:

import pandas as pd
df = pd.DataFrame({'s':['Big bear eats cat','cute cat sleeps'],'a':['bear','cat']})

Now I just want to replace (other operations could be split) rowwise the occurrence of a in s with ANIMAL so it looks like this:

0    Big ANIMAL eats cat
1    cute ANIMAL sleeps

In R data.table (with vectorized functions) I would just write something like

df[,s:=str_replace(s,a,"ANIMAL")]

I saw I might be able to use apply but that still seemed very complex for such an easy case

Upvotes: 2

Views: 37

Answers (2)

katsumi
katsumi

Reputation: 154

I found the following solution doing the same as I am used from in R by vectorizing (numpy needed) the str.replace:

import numpy as np

df['s']=np.vectorize(str.replace)(df['s'],df['a'],"ANIMAL")

print(df)
      a                    s
0  bear  Big ANIMAL eats cat
1   cat   cute ANIMAL sleeps

Upvotes: 1

jpp
jpp

Reputation: 164673

You can use a list comprehension:

df['s'] = [' '.join([i if i!=a else 'ANIMAL' for i in s.split()]) \
           for a, s in zip(df['a'], df['s'])]

print(df)

      a                    s
0  bear  Big ANIMAL eats cat
1   cat   cute ANIMAL sleeps

Upvotes: 1

Related Questions