Abhishek Kulkarni
Abhishek Kulkarni

Reputation: 225

Split a pandas dataframe into multiple dataframes if all rows are nan

I have the following dataframe.

       a          b           c          d
0   4.65  30.572857  133.899994  23.705000
1   4.77  30.625713  134.690002  23.225000
2   4.73  30.138571  132.250000  23.040001
3   5.07  30.082857  130.000000  23.290001
4   4.98  30.282858  133.520004  23.389999
5    NaN        NaN         NaN        NaN
6   4.82  29.674286  127.349998  23.700001
7   4.83  30.092857  129.110001  24.254999
8   4.85  29.918571  127.349998  24.695000
9   4.70  29.418571  127.139999  24.424999
10  4.69  30.719999  127.610001  25.200001
11   NaN        NaN         NaN        NaN
12  4.43  29.724285  126.620003  24.764999
13   NaN        NaN         NaN        NaN
14  4.29  29.010000  120.309998  24.730000
15  4.11  29.420000  119.480003  25.035000

I want to split this df into multiple dfs when there is row with all NaN.

I explored the following links but could not figure out how to apply it to my problem.

Split pandas dataframe in two if it has more than 10 rows

Splitting dataframe into multiple dataframes

In my example, I would have 4 dataframes with 5,5,1 and 2 rows as the output. Please suggest the way forward.

Upvotes: 0

Views: 1593

Answers (3)

Ukrainian-serge
Ukrainian-serge

Reputation: 854

Just another flavor of doing the same thing:

nan_indices = df.index[df.isna().all(axis=1)]

df_list = [df.dropna() for df in np.split(df, nan_indices)]

df_list
[      a          b           c          d
 0  4.65  30.572857  133.899994  23.705000
 1  4.77  30.625713  134.690002  23.225000
 2  4.73  30.138571  132.250000  23.040001
 3  5.07  30.082857  130.000000  23.290001
 4  4.98  30.282858  133.520004  23.389999,
        a          b           c          d
 6   4.82  29.674286  127.349998  23.700001
 7   4.83  30.092857  129.110001  24.254999
 8   4.85  29.918571  127.349998  24.695000
 9   4.70  29.418571  127.139999  24.424999
 10  4.69  30.719999  127.610001  25.200001,
        a          b           c          d
 12  4.43  29.724285  126.620003  24.764999,
        a      b           c       d
 14  4.29  29.01  120.309998  24.730
 15  4.11  29.42  119.480003  25.035]

Upvotes: 0

Amuoeba
Amuoeba

Reputation: 786

Something like this should do the trick:

import pandas as pd
import numpy as np


data_frame = pd.DataFrame({"a":[1,np.nan,3,np.nan,4,np.nan,5],
                           "b":[1,np.nan,3,np.nan,4,np.nan,5],
                           "c":[1,np.nan,3,np.nan,4,np.nan,5],
                           "d":[1,np.nan,3,np.nan,4,np.nan,5],
                          "e":[1,np.nan,3,np.nan,4,np.nan,5],
                          "f":[1,np.nan,3,np.nan,4,np.nan,5]})
all_nan = data_frame.index[data_frame.isnull().all(1)]
df_list = []
prev = 0
for i in all_nan:
    df_list.append(data_frame[prev:i])
    prev = i+1

for i in df_list:
    print(i)

Upvotes: 1

Erfan
Erfan

Reputation: 42916

Using isna, all, cumsum and groupby.

First we check if all the values in a row are NaN, then use cumsum to create a group indicator and finally we save these dataframes in a list with groupby:

grps = df.isna().all(axis=1).cumsum()
dfs = [df.dropna() for _, df in df.groupby(grps)]
for df in dfs:
    print(df)

      a          b           c          d
0  4.65  30.572857  133.899994  23.705000
1  4.77  30.625713  134.690002  23.225000
2  4.73  30.138571  132.250000  23.040001
3  5.07  30.082857  130.000000  23.290001
4  4.98  30.282858  133.520004  23.389999
       a          b           c          d
6   4.82  29.674286  127.349998  23.700001
7   4.83  30.092857  129.110001  24.254999
8   4.85  29.918571  127.349998  24.695000
9   4.70  29.418571  127.139999  24.424999
10  4.69  30.719999  127.610001  25.200001
       a          b           c          d
12  4.43  29.724285  126.620003  24.764999
       a      b           c       d
14  4.29  29.01  120.309998  24.730
15  4.11  29.42  119.480003  25.035

Upvotes: 5

Related Questions