Reputation: 65
INPUT>df1
ColumnA ColumnB
A1 NaN
A1A2 NaN
A3 NaN
What I tried to do is to change column B's value conditionally, based on iteration of checking ColumnA, adding remarks to column B. The previous value of column B shall be kept after new string is added.
In sample dataframe, what I want to do would be
OUTPUT>df1
ColumnA ColumnB
A1 A1
A1A2 A1_A2
A3 NaN
I have tried the following codes but not working well. Could anyone give me some advices? Thanks.
df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A1'), df1['ColumnB']+"_A1",df1['ColumnB'])
df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A2'), df1['ColumnB']+"_A2",df1['ColumnB'])
Upvotes: 1
Views: 49
Reputation: 1432
You cannot add or append strings to np.nan
. That means you would always need to check if any position in your ColumnB
is still a np.nan
or already a string to properly set its new value. If all you want to do is to work with text you could initialize your ColumnB
with empty strings and append selected string pieces from ColumnA
as:
import pandas as pd
import numpy as np
I = pd.DataFrame({'ColA': ['A1', 'A1A2', 'A2', 'A3']})
I['ColB'] = ''
I.loc[I.ColA.str.contains('A1'), 'ColB'] += 'A1'
print(I)
I.loc[I.ColA.str.contains('A2'), 'ColB'] += 'A2'
print(I)
The output is:
ColA ColB
0 A1 A1
1 A1A2 A1
2 A2
3 A3
ColA ColB
0 A1 A1
1 A1A2 A1A2
2 A2 A2
3 A3
Note: this is a very verbose version as an example.
Upvotes: 1
Reputation: 29742
One way using pandas.Series.str.findall
with join
:
key = ["A1", "A2"]
df["ColumnB"] = df["ColumnA"].str.findall("|".join(key)).str.join("_")
print(df)
Output:
ColumnA ColumnB
0 A1 A1
1 A1A2 A1_A2
2 A3
Upvotes: 3