Kay
Kay

Reputation: 23

How to divide dataframe into 2 equal parts (first half rows and second half rows) - in Python

I have a dataframe and need to break it into 2 equal dataframes.

1st dataframe would contain top half rows and 2nd would contain the remaining rows.

Please help how to achieve this using python.

Also in both the even rows scenario and odd rows scenario (as in odd rows I would need to drop the last row to make it equal).

enter image description here

enter image description here

Upvotes: 1

Views: 6479

Answers (2)

mukund ghode
mukund ghode

Reputation: 262

with a simple eg. you can try as below:

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20],['Jerry',25]]
#data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20]]
data1 = data[0:int(len(data)/2)]
if (len(data) % 2) == 0:
    data2 = data[int(len(data)/2):]
else:
    data2 = data[int(len(data)/2):-1]

df1 = pd.DataFrame(data1, columns=['Name', 'Age'], dtype=float); print("1st half:\n",df1)
df2 = pd.DataFrame(data2, columns=['Name', 'Age'], dtype=float); print("2nd Half:\n",df2)

Output:

D:\Python>python temp.py

1st half:
    Name   Age
 0  Alex  10.0
 1   Bob  12.0
2nd Half:
    Name   Age
 0  Clarke  13.0
 1     Tom  20.0

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34086

Consider df:

In [122]: df
Out[122]: 
    id  days  sold  days_lag
0    1     1     1         0
1    1     3     0         2
2    1     3     1         2
3    1     8     1         5
4    1     8     1         5
5    1     8     0         5
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3
12   3     4     5         6

Use numpy.array_split():

In [127]: import numpy as np

In [128]: def split_df(df):
     ...:     if len(df) % 2 != 0:  # Handling `df` with `odd` number of rows
     ...:         df = df.iloc[:-1, :]
     ...:     df1, df2 =  np.array_split(df, 2)
     ...:     return df1, df2
     ...: 

In [130]: df1, df2 = split_df(df)

In [131]: df1
Out[131]: 
   id  days  sold  days_lag
0   1     1     1         0
1   1     3     0         2
2   1     3     1         2
3   1     8     1         5
4   1     8     1         5
5   1     8     0         5

In [133]: df2
Out[133]: 
    id  days  sold  days_lag
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3

Upvotes: 3

Related Questions