Guillermo Izquierdo
Guillermo Izquierdo

Reputation: 121

Merging a DataFrame with a lot of NaN

Hi I have Pandas data frame that looks like this:

      0            1    2         3            4     5     6
150  NaN          NaN  NaN       NaN   March 1980   NaN   NaN
151  NaN          NaN  NaN       NaN    June 1990   NaN   NaN
152  NaN          NaN  NaN  Sep 2015          NaN   NaN   NaN
153  NaN          NaN  NaN  Jan 1972          NaN   NaN   NaN
154  NaN          NaN  NaN  Mar 1974          NaN   NaN   NaN

I cannot use dropna(), cuz I will have an empty dataframe.

All columns have one data in one column, is there a way to transform it in one column DataFrame?

         0           
150  March 1980
151  June 1990
152  Sep 2015
153  Jan 1972
154  Mar 1974

Thanks.

Upvotes: 0

Views: 72

Answers (3)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Another solution is with the help of boolean mask and pd.notnull, which is a lot lot faster than sum and sorted i.e

sdf = pd.DataFrame(df.values[pd.notnull(df)],index=df.index)

Output :

            0
150  March1980
151   June1990
152    Sep2015
153    Jan1972
154    Mar1974
ndf = pd.concat([df.reset_index(drop=True)]*1000)

%%timeit
ndf.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
1 loop, best of 3: 1.29 s per loop

%%timeit
ndf.bfill(1).iloc[:,0]
1 loop, best of 3: 773 ms per loop

%%timeit
ndf.fillna('').sum(1)
10 loops, best of 3: 26.4 ms per loop

%%timeit
pd.DataFrame(ndf.values[pd.notnull(ndf)],index=ndf.index)
100 loops, best of 3: 3.11 ms per loop

Upvotes: 1

BENY
BENY

Reputation: 323276

Is this what you want ?

df.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
Out[1052]: 
             0
150  March1980
151   June1990
152    Sep2015
153    Jan1972
154    Mar1974

Or

df.bfill(1).iloc[:,0]
Out[1056]: 
150    March1980
151     June1990
152      Sep2015
153      Jan1972
154      Mar1974
Name: 0, dtype: object

Or

df.stack()
Out[1058]: 
150  4    March1980
151  4     June1990
152  3      Sep2015
153  3      Jan1972
154  3      Mar1974
dtype: object

Upvotes: 3

Vaishali
Vaishali

Reputation: 38415

Try

df = df.fillna('').sum(1)

or

df = df.fillna('').apply(''.join, axis = 1)

You get

150    March 1980
151     June 1990
152      Sep 2015
153      Jan 1972
154      Mar 1974
dtype: object

Upvotes: 4

Related Questions