Robert
Robert

Reputation: 159

Updating a column in a Pandas DataFrame based on a condition of another column

I'm interested in adding a text tag to a new column in a Pandas dataframe. The following example works but I get the copy warning and I don't fully understand if I should ignore it in this case.

The DataFrame simply has either a character or is an empty string:

In [1]: import pandas as pd

In [2]: df=pd.DataFrame({('A'):['x','','x',''], ('B'):['x','x','','']})

In [3]: df
Out[3]:
   A  B
0  x  x
1     x
2  x
3

Create a new column called 'msg'

In [4]: df['msg'] = ''

In [5]: df
Out[5]:
   A  B msg
0  x  x
1     x
2  x
3

Set the 'msg' column to 'red' if 'A' is not an empty string

In [6]: df['msg'][df['A'] != ''] = 'red;'

In [7]: df
Out[7]:
   A  B  msg
0  x  x  red;
1     x
2  x     red;
3

Concatenate 'blue' depending on the 'B' column values

In [8]: df['msg'][df['B'] != ''] += 'blue;'

In [9]: df
Out[9]:
   A  B       msg
0  x  x  red;blue;
1     x     blue;
2  x         red;
3

Alternatively, I found using numpy.where produced the desired result. What is the proper way to do this in Pandas?

import numpy as np

df['msg'] += np.where(df['A'] != '','green;', '')

Update: 4/15/2018

Upon further thought, it would be useful to retain the data from the original DataFrame in certain cases but still attach a label ('color' in this example). The answer from @COLDSPEED led me to the following (changing 'blue;' to 'blue:' and preserving column 'B' data to include in the tag in this case):

df['msg'] = (v.where(df.applymap(len) > 0, '') + 
             df.where(df[['B']].applymap(len)>0,'')).agg(''.join, axis=1)


   A  B         msg
0  x  x  red;blue:x
1     x      blue:x
2  x           red;
3

Upvotes: 4

Views: 1000

Answers (3)

BENY
BENY

Reputation: 323376

You can using dot and replace

(df!='').dot(df.columns).replace({'A':'red;','B':'blue;'},regex=True)
Out[379]: 
0    red;blue;
1        blue;
2         red;
3             
dtype: object

#df['msg']=(df!='').dot(df.columns).replace({'A':'red;','B':'blue;'},regex=True)

Upvotes: 4

piRSquared
piRSquared

Reputation: 294516

Using pandas.DataFrame.dot
Special note that I set the dtype of the array to object. Otherwise the dot won't work.

a = np.array(['red', 'blue;'], object)

df.assign(msg=df.astype(bool).dot(a))

   A  B        msg
0  x  x  red;blue;
1     x      blue;
2  x          red;
3                 

Upvotes: 4

cs95
cs95

Reputation: 403120

If you know your colors before-hand, you could use masking with DataFrame.where and str.join to get this done.

v = pd.DataFrame(
     np.repeat([['red;', 'blue;']], len(df), axis=0), 
     columns=df.columns, 
     index=df.index
) 
df['msg'] = v.where(df.applymap(len) > 0, '').agg(''.join, axis=1)

df
   A  B        msg
0  x  x  red;blue;
1     x      blue;
2  x          red;
3              

Upvotes: 4

Related Questions