Reputation: 1956
How could NaN values be completely ommitted from the new column in order to avoid consecutive commas?
df['newcolumn'] = df.apply(''.join, axis=1)
One approach would probably be using a conditional lambda
df.apply(lambda x: ','.join(x.astype(str)) if(np.isnan(x.astype(str))) else '', axis = 1)
But this returns an error message:
TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')
Edit: Both your answers work. In order to obtain the answer, what critera would I use to determine which one to code? Performance considerations?
Upvotes: 0
Views: 30
Reputation: 29635
you can use dropna
in your apply
such as:
df.apply(lambda x: ','.join(x.dropna()), axis = 1)
With @Wen input for df, if you compare for small df, this one is slightly faster
%timeit df.apply(lambda x: ','.join(x.dropna()),1)
1000 loops, best of 3: 1.04 ms per loop
%timeit df.stack().groupby(level=0).apply(','.join)
1000 loops, best of 3: 1.6 ms per loop
but for bigger dataframe, @Wen answer is way faster
df_long = pd.concat([df]*1000)
%timeit df_long.apply(lambda x: ','.join(x.dropna()),1)
1 loop, best of 3: 850 ms per loop
%timeit df_long.stack().groupby(level=0).apply(','.join)
100 loops, best of 3: 13.1 ms per loop
Upvotes: 1
Reputation: 323316
You can using stack
, since it will remove the NaN
by default
df.stack().groupby(level=0).apply(','.join)
Out[552]:
0 a,t,y
1 a,t
2 a,u,y
3 a,u,n
4 a,u
5 b,t,y
dtype: object
Data input
df
Out[553]:
Mary John David
0 a t y
1 a t NaN
2 a u y
3 a u n
4 a u NaN
5 b t y
Upvotes: 2