Reputation: 14062
I have a df
df = pd.DataFrame(np.random.randn(11,3))
0 1 2
0 0.102645 -1.530977 0.408735
1 1.081442 0.615082 -1.457931
2 1.852951 0.360998 0.178162
3 0.726028 2.072609 -1.167996
4 -0.454453 1.310887 -0.969910
5 -0.098552 -0.718283 0.372660
6 0.334170 -0.347934 -0.626079
7 -1.034541 -0.496949 -0.287830
8 1.870277 0.508380 -2.466063
9 1.464942 -0.020060 -0.684136
10 -1.057930 0.295145 0.161727
How can I split this in a given number of subsections, lets say 2 for now.
Something like this
0 1 2
0 0.102645 -1.530977 0.408735
1 1.081442 0.615082 -1.457931
2 1.852951 0.360998 0.178162
3 0.726028 2.072609 -1.167996
4 -0.454453 1.310887 -0.969910
0 1 2
5 -0.098552 -0.718283 0.372660
6 0.334170 -0.347934 -0.626079
7 -1.034541 -0.496949 -0.287830
8 1.870277 0.508380 -2.466063
9 1.464942 -0.020060 -0.684136
10 -1.057930 0.295145 0.161727
Ideally I would like to use np.array_split(df, 2) but it throws an error as its not an array.
Is there a built in function to do this? I don't particularly want to use df.loc[a:b] because its difficult to calculate the start and end depending on the given number of sub-dataframes needed.
Upvotes: 0
Views: 183
Reputation: 2555
Try the following. It should return an array of n sub-dataframes if concatenated would return the original dataframe in question.
import math
def split(df, n):
size = math.ceil(len(df) / n)
return [ df[i:i + size] for i in range(0, len(df), size) ]
Upvotes: 1