user5421875
user5421875

Reputation:

Split pandas dataframe into multiple dataframes with equal numbers of rows

I have a dataframe df :

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895
26  0.165158    0.173424    0.896344
27  1.157766    0.525674    -1.279618
28  1.729730    -0.798158   0.644869
29  -0.107285   -1.290374   0.544023

that I need to split into multiple dataframes that will contain every 10 rows of df , and every small dataframe I will write to separate file. so I decided create multilevel dataframe, and for this first assign the index to every 10 rows in my df with this method:

df['split'] = df['split'].apply(lambda x: np.searchsorted(df.iloc[::10], x, side='right')[0])

that throws out

TypeError: 'function' object has no attribute '__getitem__'

So, do you have an idea of how to fix it? Where is my method wrong?

But if you have another approach to split my dataframe into multiple dataframes every of which contains 10 rows of df, you are also welcome, cause this approach was just the first I thought about, but I'm not sure that it's the best one.

Upvotes: 6

Views: 16529

Answers (2)

adw
adw

Reputation: 431

There are many ways to do what you want, your method looks over-complicated. A groupby using a scaled index as the grouping key would work:

df = pd.DataFrame(data=np.random.rand(100, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))//10)
for (frameno, frame) in groups:
    frame.to_csv("%s.csv" % frameno)

Upvotes: 20

Alexander
Alexander

Reputation: 109536

You can use a dictionary comprehension to save slices of the dataframe in groups of ten rows:

df_dict = {n: df.iloc[n:n+10, :] 
           for n in range(0, len(df), 10)}

>>> df_dict.keys()
[0, 10, 20]

>>> df_dict[10]
           a         b         c
10 -0.011909 -0.304162  0.422001
11  0.127570  0.956831  1.837523
12 -1.074771  0.379723 -1.889117
13 -1.449475 -0.799574 -0.878192
14 -1.029757  0.551023  2.519929
15 -1.001400  0.838614 -1.006977
16  0.677216 -0.403859  0.451338
17  0.221596 -0.323259  0.324158
18 -0.241935 -2.251687 -0.088494
19 -0.995426  0.665569 -2.228848

Upvotes: 4

Related Questions