Reputation: 121
Hi I have Pandas data frame that looks like this:
0 1 2 3 4 5 6
150 NaN NaN NaN NaN March 1980 NaN NaN
151 NaN NaN NaN NaN June 1990 NaN NaN
152 NaN NaN NaN Sep 2015 NaN NaN NaN
153 NaN NaN NaN Jan 1972 NaN NaN NaN
154 NaN NaN NaN Mar 1974 NaN NaN NaN
I cannot use dropna(), cuz I will have an empty dataframe.
All columns have one data in one column, is there a way to transform it in one column DataFrame?
0
150 March 1980
151 June 1990
152 Sep 2015
153 Jan 1972
154 Mar 1974
Thanks.
Upvotes: 0
Views: 72
Reputation: 30605
Another solution is with the help of boolean mask and pd.notnull, which is a lot lot faster than sum
and sorted
i.e
sdf = pd.DataFrame(df.values[pd.notnull(df)],index=df.index)
Output :
0 150 March1980 151 June1990 152 Sep2015 153 Jan1972 154 Mar1974
ndf = pd.concat([df.reset_index(drop=True)]*1000)
%%timeit
ndf.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
1 loop, best of 3: 1.29 s per loop
%%timeit
ndf.bfill(1).iloc[:,0]
1 loop, best of 3: 773 ms per loop
%%timeit
ndf.fillna('').sum(1)
10 loops, best of 3: 26.4 ms per loop
%%timeit
pd.DataFrame(ndf.values[pd.notnull(ndf)],index=ndf.index)
100 loops, best of 3: 3.11 ms per loop
Upvotes: 1
Reputation: 323276
Is this what you want ?
df.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
Out[1052]:
0
150 March1980
151 June1990
152 Sep2015
153 Jan1972
154 Mar1974
Or
df.bfill(1).iloc[:,0]
Out[1056]:
150 March1980
151 June1990
152 Sep2015
153 Jan1972
154 Mar1974
Name: 0, dtype: object
Or
df.stack()
Out[1058]:
150 4 March1980
151 4 June1990
152 3 Sep2015
153 3 Jan1972
154 3 Mar1974
dtype: object
Upvotes: 3
Reputation: 38415
Try
df = df.fillna('').sum(1)
or
df = df.fillna('').apply(''.join, axis = 1)
You get
150 March 1980
151 June 1990
152 Sep 2015
153 Jan 1972
154 Mar 1974
dtype: object
Upvotes: 4