Reputation: 3834
I am trying to concatenating two columns in pandas DataFrame. The problem is when there is None value exist in either series, the result is NaN. Since the real data is very large and there is value to keep original None values for later reference, I hope not to change the original value in the columns. Is there a way to achieve this in pandas?
To create an example DataFrame:
import pandas as pd
f = pd.DataFrame([['a', 'b','c','a', 'b','c'],['1', '2','3', '4', '5','6', ]])
f = f.transpose()
f.columns = ['xx', 'yy']
f.xx[0] = None
f.yy[0] = None
f.xx[2] = None
f.yy[3] = None
xx yy
0 None None
1 b 2
2 None 3
3 a None
4 b 5
5 c 6
I tried f['new_str'] = f.xx + f.yy
and f['new_str'] = f['xx'] + f['yy']
. Both set the concatenated value to NaN if any of the value is None type. I think this is due to how pandas handle None
type. The None type and str type is not "addable" by the '+' operator.
xx yy new_str
0 None None NaN
1 b 2 b2
2 None 3 NaN
3 a None NaN
4 b 5 b5
5 c 6 c6
Here is what I want to do:
f['new_str'] = f.xx.map(lambda x: '')
for idx, arow in f.iterrows():
con = ''
if arow.xx:
con += arow.xx
if arow.yy:
con += arow.yy
f.loc[idx,'new_str'] = con
f
xx yy new_str
0 None None
1 b 2 b2
2 None 3 3
3 a None a
4 b 5 b5
5 c 6 c6
My question is that does pandas support a more elegant/simple way to achieve this?
Upvotes: 3
Views: 5317
Reputation: 2885
Call fillna
on each column to set the Nones to ''
, which is the identity element under string concatenation.
f['new_str'] = f.xx.fillna('') + f.yy.fillna('')
This gives a new column formatted the way you wanted:
>>> f
xx yy new_str
0 None None
1 b 2 b2
2 None 3 3
3 a None a
4 b 5 b5
5 c 6 c6
Upvotes: 6