Reputation: 2055
I have two columns in my DataFrame
. I would like to replace the value of the first column with the second column if the text in the first column is a substring in the second column.
Example:
Input:
col1 col2
-----------------
text1 text1 and text2
some text some other text
text 3
text 4 this is text 4
Output:
col1 col2
------------------------------
text1 and text2 text1 and text2
some text some other text
text 3
this is text 4 this is text 4
As you see I have replaces row 1 and row 4 as the text in row 1 column 1 is a substring of column 2.
How can I perform this operation in pandas?
Upvotes: 0
Views: 80
Reputation: 35686
A NaN safe python option via zip
:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = [b if isinstance(b, str) and a in b else a
for a, b in zip(df['col1'], df['col2'])]
A NaN safe pandas option via fillna
+ apply
:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = df.fillna('').apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
Another option via boolean index isna
+ loc
:
m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
df
:
col1 col2
0 text1 and text2 text1 and text2
1 some text some other text
2 text 3 NaN
3 this is text 4 this is text 4
Upvotes: 1
Reputation: 2248
Try df.apply
with axis=1
.
So this would iterate through each row and check whether col1 is substring of col2.
If yes then return col2 else return col1
df['col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
Full Code:
df = pd.DataFrame({'col1': ['text1', 'some text', 'text 3', 'text 4'], 'col2': ['text1 and text2', 'some other text', '', 'this is text 4']})
df['new_col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
df
col1 col2 new_col1
0 text1 text1 and text2 text1 and text2
1 some text some other text some text
2 text 3 text 3
3 text 4 this is text 4 this is text 4
Upvotes: 2