Pandas: conditionally concatenate original columns with a string

Question

INPUT>df1

    ColumnA ColumnB
     A1       NaN
     A1A2     NaN
     A3       NaN

What I tried to do is to change column B's value conditionally, based on iteration of checking ColumnA, adding remarks to column B. The previous value of column B shall be kept after new string is added.

In sample dataframe, what I want to do would be

If ColumnA contains A1. If so, add string "A1" to Column B (without cleaning all previous value.)
If ColumnA contains A2. If so, add string "A2" to Column B (without cleaning all previous value.)

OUTPUT>df1

    ColumnA ColumnB
     A1       A1
     A1A2     A1_A2
     A3       NaN

I have tried the following codes but not working well. Could anyone give me some advices? Thanks.

df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A1'), df1['ColumnB']+"_A1",df1['ColumnB'])
df1['ColumnB'] = np.where(df1['ColumnA'].str.contains('A2'), df1['ColumnB']+"_A2",df1['ColumnB'])

deponovo · Accepted Answer

You cannot add or append strings to np.nan. That means you would always need to check if any position in your ColumnB is still a np.nan or already a string to properly set its new value. If all you want to do is to work with text you could initialize your ColumnB with empty strings and append selected string pieces from ColumnA as:

import pandas as pd
import numpy as np

I = pd.DataFrame({'ColA': ['A1', 'A1A2', 'A2', 'A3']})
I['ColB'] = ''
I.loc[I.ColA.str.contains('A1'), 'ColB'] += 'A1'
print(I)

I.loc[I.ColA.str.contains('A2'), 'ColB'] += 'A2'
print(I)

The output is:

   ColA ColB
0    A1   A1
1  A1A2   A1
2    A2     
3    A3     
   ColA  ColB
0    A1    A1
1  A1A2  A1A2
2    A2    A2
3    A3

Note: this is a very verbose version as an example.

Pandas: conditionally concatenate original columns with a string

Answers (2)

Related Questions