Bikas Katwal
Bikas Katwal

Reputation: 2055

Pandas - Replace value from another column in certain conditions

I have two columns in my DataFrame. I would like to replace the value of the first column with the second column if the text in the first column is a substring in the second column.

Example:

Input: 

col1       col2
-----------------
text1      text1 and text2
some text  some other text
text 3     
text 4     this is text 4

Output:

col1                 col2
------------------------------
text1 and text2      text1 and text2
some text            some other text
text 3     
this is text 4       this is text 4

As you see I have replaces row 1 and row 4 as the text in row 1 column 1 is a substring of column 2.

How can I perform this operation in pandas?

Upvotes: 0

Views: 80

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35686

A NaN safe python option via zip:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = [b if isinstance(b, str) and a in b else a
              for a, b in zip(df['col1'], df['col2'])]

A NaN safe pandas option via fillna + apply:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = df.fillna('').apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

Another option via boolean index isna + loc:

m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

df:

              col1             col2
0  text1 and text2  text1 and text2
1        some text  some other text
2          text 3               NaN
3   this is text 4   this is text 4

Upvotes: 1

Shubham Periwal
Shubham Periwal

Reputation: 2248

Try df.apply with axis=1.

So this would iterate through each row and check whether col1 is substring of col2.
If yes then return col2 else return col1

df['col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)

Full Code:

df = pd.DataFrame({'col1': ['text1', 'some text', 'text 3', 'text 4'], 'col2': ['text1 and text2', 'some other text', '', 'this is text 4']})

df['new_col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)

df

        col1    col2             new_col1
0   text1       text1 and text2  text1 and text2
1   some text   some other text  some text
2   text 3                       text 3
3   text 4      this is text 4   this is text 4

Upvotes: 2

Related Questions