dirtyw0lf
dirtyw0lf

Reputation: 1956

Adding dataframe columns together, separated by columns considering NaNs

How could NaN values be completely ommitted from the new column in order to avoid consecutive commas?

df['newcolumn'] = df.apply(''.join, axis=1)

One approach would probably be using a conditional lambda

df.apply(lambda x: ','.join(x.astype(str)) if(np.isnan(x.astype(str))) else '', axis = 1)

But this returns an error message:

TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')

Edit: Both your answers work. In order to obtain the answer, what critera would I use to determine which one to code? Performance considerations?

Upvotes: 0

Views: 30

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you can use dropna in your apply such as:

df.apply(lambda x: ','.join(x.dropna()), axis = 1)

With @Wen input for df, if you compare for small df, this one is slightly faster

%timeit df.apply(lambda x: ','.join(x.dropna()),1)
1000 loops, best of 3: 1.04 ms per loop
%timeit df.stack().groupby(level=0).apply(','.join)
1000 loops, best of 3: 1.6 ms per loop

but for bigger dataframe, @Wen answer is way faster

df_long = pd.concat([df]*1000)
%timeit df_long.apply(lambda x: ','.join(x.dropna()),1)
1 loop, best of 3: 850 ms per loop
%timeit df_long.stack().groupby(level=0).apply(','.join)
100 loops, best of 3: 13.1 ms per loop

Upvotes: 1

BENY
BENY

Reputation: 323316

You can using stack , since it will remove the NaN by default

df.stack().groupby(level=0).apply(','.join)
Out[552]: 
0    a,t,y
1      a,t
2    a,u,y
3    a,u,n
4      a,u
5    b,t,y
dtype: object

Data input


df
Out[553]: 
  Mary John David
0    a    t     y
1    a    t   NaN
2    a    u     y
3    a    u     n
4    a    u   NaN
5    b    t     y

Upvotes: 2

Related Questions