Reputation: 2189

Remove/replace columns values based on another columns using pandas

I have a data frame like this:

df
col1     col2      col3
 ab       1        prab
 cd       2        cdff
 ef       3        eef

I want to remove col1 values from the col3 values

the final data frame should look like<

df
col1     col2      col3
 ab       1        pr
 cd       2        ff
 ef       3        e

How to do it using pandas in most effective way ?

Upvotes: 0

Answers (3)

Reputation: 150805

It looks like a loop is unavoidable since you have to work with replacing/removing substrings. In that case, list comprehension might come in handy:

%%timeit
df.apply(lambda x: x['col3'].replace(x['col1'], ''), axis=1)

# 767 µs ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

while

%%timeit
[a.replace(b,'') for a,b in zip(df['col3'], df['col1'])]

# 24.4 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 1

Reputation: 42946

Use .apply with replace over axis=1:

df['col3'] = df.apply(lambda x: x['col3'].replace(x['col1'], ''), axis=1)

Output

  col1  col2 col3
0   ab     1   pr
1   cd     2   ff
2   ef     3    e

Upvotes: 2

Reputation: 2285

Suppose df is a matrix :

df = [["ab",1,"prab"],["cd",2,"cdff"],["ef",3,"eef"]]

You want to remove the key (col1) in each value (col3) for each row :

for row in df:
  row[2] = row[2].replace(row[0],"")

Following this documentation each occurence of col1 is replaced by an empty string: "".

Upvotes: 0