Pandas - Replace value from another column in certain conditions

Question

I have two columns in my DataFrame. I would like to replace the value of the first column with the second column if the text in the first column is a substring in the second column.

Example:

Input: 

col1       col2
-----------------
text1      text1 and text2
some text  some other text
text 3     
text 4     this is text 4

Output:

col1                 col2
------------------------------
text1 and text2      text1 and text2
some text            some other text
text 3     
this is text 4       this is text 4

As you see I have replaces row 1 and row 4 as the text in row 1 column 1 is a substring of column 2.

How can I perform this operation in pandas?

Henry Ecker · Accepted Answer

A NaN safe python option via zip:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = [b if isinstance(b, str) and a in b else a
              for a, b in zip(df['col1'], df['col2'])]

A NaN safe pandas option via fillna + apply:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
    'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
             3: 'this is text 4'}
})

df['col1'] = df.fillna('').apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

Another option via boolean index isna + loc:

m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
    lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
    axis=1
)

df:

              col1             col2
0  text1 and text2  text1 and text2
1        some text  some other text
2          text 3               NaN
3   this is text 4   this is text 4

Pandas - Replace value from another column in certain conditions

Answers (2)

Related Questions