Reputation: 159
I'm interested in adding a text tag to a new column in a Pandas dataframe. The following example works but I get the copy warning and I don't fully understand if I should ignore it in this case.
The DataFrame simply has either a character or is an empty string:
In [1]: import pandas as pd
In [2]: df=pd.DataFrame({('A'):['x','','x',''], ('B'):['x','x','','']})
In [3]: df
Out[3]:
A B
0 x x
1 x
2 x
3
Create a new column called 'msg'
In [4]: df['msg'] = ''
In [5]: df
Out[5]:
A B msg
0 x x
1 x
2 x
3
Set the 'msg' column to 'red' if 'A' is not an empty string
In [6]: df['msg'][df['A'] != ''] = 'red;'
In [7]: df
Out[7]:
A B msg
0 x x red;
1 x
2 x red;
3
Concatenate 'blue' depending on the 'B' column values
In [8]: df['msg'][df['B'] != ''] += 'blue;'
In [9]: df
Out[9]:
A B msg
0 x x red;blue;
1 x blue;
2 x red;
3
Alternatively, I found using numpy.where produced the desired result. What is the proper way to do this in Pandas?
import numpy as np
df['msg'] += np.where(df['A'] != '','green;', '')
Update: 4/15/2018
Upon further thought, it would be useful to retain the data from the original DataFrame in certain cases but still attach a label ('color' in this example). The answer from @COLDSPEED led me to the following (changing 'blue;' to 'blue:' and preserving column 'B' data to include in the tag in this case):
df['msg'] = (v.where(df.applymap(len) > 0, '') +
df.where(df[['B']].applymap(len)>0,'')).agg(''.join, axis=1)
A B msg
0 x x red;blue:x
1 x blue:x
2 x red;
3
Upvotes: 4
Views: 1000
Reputation: 323376
You can using dot
and replace
(df!='').dot(df.columns).replace({'A':'red;','B':'blue;'},regex=True)
Out[379]:
0 red;blue;
1 blue;
2 red;
3
dtype: object
#df['msg']=(df!='').dot(df.columns).replace({'A':'red;','B':'blue;'},regex=True)
Upvotes: 4
Reputation: 294516
Using pandas.DataFrame.dot
Special note that I set the dtype of the array to object
. Otherwise the dot
won't work.
a = np.array(['red', 'blue;'], object)
df.assign(msg=df.astype(bool).dot(a))
A B msg
0 x x red;blue;
1 x blue;
2 x red;
3
Upvotes: 4
Reputation: 403120
If you know your colors before-hand, you could use masking with DataFrame.where
and str.join
to get this done.
v = pd.DataFrame(
np.repeat([['red;', 'blue;']], len(df), axis=0),
columns=df.columns,
index=df.index
)
df['msg'] = v.where(df.applymap(len) > 0, '').agg(''.join, axis=1)
df
A B msg
0 x x red;blue;
1 x blue;
2 x red;
3
Upvotes: 4