lamiren
lamiren

Reputation: 65

Pandas: conditionally concatenate original columns with a string

INPUT>df1

    ColumnA ColumnB
     A1       NaN
     A1A2     NaN
     A3       NaN       

What I tried to do is to change column B's value conditionally, based on iteration of checking ColumnA, adding remarks to column B. The previous value of column B shall be kept after new string is added.

In sample dataframe, what I want to do would be

OUTPUT>df1

    ColumnA ColumnB
     A1       A1
     A1A2     A1_A2
     A3       NaN       

I have tried the following codes but not working well. Could anyone give me some advices? Thanks.

df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A1'), df1['ColumnB']+"_A1",df1['ColumnB'])
df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A2'), df1['ColumnB']+"_A2",df1['ColumnB'])

Upvotes: 1

Views: 49

Answers (2)

deponovo
deponovo

Reputation: 1432

You cannot add or append strings to np.nan. That means you would always need to check if any position in your ColumnB is still a np.nan or already a string to properly set its new value. If all you want to do is to work with text you could initialize your ColumnB with empty strings and append selected string pieces from ColumnA as:

import pandas as pd
import numpy as np

I = pd.DataFrame({'ColA': ['A1', 'A1A2', 'A2', 'A3']})
I['ColB'] = ''
I.loc[I.ColA.str.contains('A1'), 'ColB'] += 'A1'
print(I)

I.loc[I.ColA.str.contains('A2'), 'ColB'] += 'A2'
print(I)

The output is:

   ColA ColB
0    A1   A1
1  A1A2   A1
2    A2     
3    A3     
   ColA  ColB
0    A1    A1
1  A1A2  A1A2
2    A2    A2
3    A3      

Note: this is a very verbose version as an example.

Upvotes: 1

Chris
Chris

Reputation: 29742

One way using pandas.Series.str.findall with join:

key = ["A1", "A2"]
df["ColumnB"] = df["ColumnA"].str.findall("|".join(key)).str.join("_")
print(df)

Output:

  ColumnA ColumnB
0      A1      A1
1    A1A2   A1_A2
2      A3        

Upvotes: 3

Related Questions