Reputation: 55
I am new to Python and my requirement is to find a clean code for splitting a dataframe into different dataframes as per a set of row indices.
The Dataframe Module1
has more than a million rows. It needs to be split as per the below index numbers starting from 0.
Int64Index([55893, 122056, 180227, 234314], dtype='int64')
That is the first spilt dataframe should be 0 to 55892, the next one from 55893 to 122055 etc.
This is my code and the problem lies with the last dataframe from 234314 to the end. I am not sure how to implement it in loop.
start=0
Module=[]
for ele in indexing:
Module.append(Module1[start:ele])
start=ele
Module.append(Module1[start:])
print(Module)
But, I would like to get a much cleaner solution for this code.
Upvotes: 0
Views: 40
Reputation: 344
You could use iloc and a loop, as iloc it splits the dataframe in sub-dataframes of your desired length. Expected behaviour in the loop should be something like:
step = 55893
df_1 = Module1.iloc[:step, :]
df_2 = Module1.iloc[step:(step*2), :]
df_3 = Module1.iloc[(step*2):(step*3), :]
...
df_n = Module1.iloc[(step*(n-1)):(step*n), :]
P.S: check out numpy's split for an alternative.
Upvotes: 1