Ralk
Ralk

Reputation: 461

Extract sub-DataFrames

I have this kind of dataframe in Pandas :

NaN
1
NaN
452
1175
12
NaN
NaN
NaN
145
125
NaN
1259
2178
2514
1

On the other hand I have this other dataframe :

1
2
3
4
5
6

I would like to separate the first one into differents sub-dataframes like this:

DataFrame 1:
  1
DataFrame 2:
  452
  1175
  12
DataFrame 3:

DataFrame 4:

DataFrame 5:
  145
  125
DataFrame 6:
  1259
  2178
  2514
  1

How can I do that without a loop?

Upvotes: 1

Views: 387

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

UPDATE: thanks to @piRSquared for pointing out that the solution above will not work for DFs/Series with non-numeric indexes. Here is more generic solution:

dfs = [x.dropna()
       for x in np.split(df, np.arange(len(df))[df['column'].isnull().values])]

OLD answer:

IIUC you can do something like this:

Source DF:

In [40]: df
Out[40]:
    column
0      NaN
1      1.0
2      NaN
3    452.0
4   1175.0
5     12.0
6      NaN
7      NaN
8      NaN
9    145.0
10   125.0
11     NaN
12  1259.0
13  2178.0
14  2514.0
15     1.0

Solution:

In [31]: dfs = [x.dropna()
                for x in np.split(df, df.index[df['column'].isnull()].values+1)]

In [32]: dfs[0]
Out[32]:
Empty DataFrame
Columns: [column]
Index: []

In [33]: dfs[1]
Out[33]:
   column
1     1.0

In [34]: dfs[2]
Out[34]:
   column
3   452.0
4  1175.0
5    12.0

In [35]: dfs[3]
Out[35]:
Empty DataFrame
Columns: [column]
Index: []

In [36]: dfs[4]
Out[36]:
Empty DataFrame
Columns: [column]
Index: []

In [37]: dfs[4]
Out[37]:
Empty DataFrame
Columns: [column]
Index: []

In [38]: dfs[5]
Out[38]:
    column
9    145.0
10   125.0

In [39]: dfs[6]
Out[39]:
    column
12  1259.0
13  2178.0
14  2514.0
15     1.0

Upvotes: 2

piRSquared
piRSquared

Reputation: 294258

w = np.append(np.where(np.isnan(df.iloc[:, 0].values))[0], len(df))
splits = {'DataFrame{}'.format(c): df.iloc[i+1:j]
          for c, (i, j) in enumerate(zip(w, w[1:]))}

Print out splits to demonstrate

for k, v in splits.items():
    print(k)
    print(v)
    print()

DataFrame0
     0
1  1.0

DataFrame1
        0
3   452.0
4  1175.0
5    12.0

DataFrame2
Empty DataFrame
Columns: [0]
Index: []

DataFrame3
Empty DataFrame
Columns: [0]
Index: []

DataFrame4
        0
9   145.0
10  125.0

DataFrame5
         0
12  1259.0
13  2178.0
14  2514.0
15     1.0

Upvotes: 1

Related Questions