SaveEarth
SaveEarth

Reputation: 55

Splitting a Dataframe as per a set of row indices

I am new to Python and my requirement is to find a clean code for splitting a dataframe into different dataframes as per a set of row indices.

The Dataframe Module1 has more than a million rows. It needs to be split as per the below index numbers starting from 0.

Int64Index([55893, 122056, 180227, 234314], dtype='int64')

That is the first spilt dataframe should be 0 to 55892, the next one from 55893 to 122055 etc.

This is my code and the problem lies with the last dataframe from 234314 to the end. I am not sure how to implement it in loop.

  start=0
  Module=[]
  for ele in indexing:
      Module.append(Module1[start:ele])
      start=ele
  Module.append(Module1[start:])
  print(Module)

But, I would like to get a much cleaner solution for this code.

Upvotes: 0

Views: 40

Answers (1)

Israel Figueirido
Israel Figueirido

Reputation: 344

You could use iloc and a loop, as iloc it splits the dataframe in sub-dataframes of your desired length. Expected behaviour in the loop should be something like:

step = 55893

df_1 = Module1.iloc[:step, :]
df_2 = Module1.iloc[step:(step*2), :]
df_3 = Module1.iloc[(step*2):(step*3), :]
...
df_n = Module1.iloc[(step*(n-1)):(step*n), :]

P.S: check out numpy's split for an alternative.

Upvotes: 1

Related Questions