Pandas. Splitting Data frame content continuously and evenly across multiple columns

Question

As a simple example, suppose I have the following data frame in pandas

the_content = {"Name" : ["John", "Kathy", "Kurtis", "Sharon"], 
              "Hobbies" : ["Fishing", "Sewing", "Skiing", "Biking"]}

When I then create a data frame:

panda_table = pd.DataFrame(the_content)

And then convert it into an html table:

html_panda_table = panda_table.to_html()

The result consists of four data rows and two columns. Something like

    Names    Hobbies
0   John     Fishing
1   Kathy    Sewing
2   Kurtis   Skiing
3   Sharon   Biking

However, I would like to split the same content across repeating rows, evenly. The purpose of this becomes more obvious if there are lots of data. In this case, I would want to set a value, say 4 columns. Then there would be only one data row and 8 columns.

   Names   Hobbies   Names     Hobbies   Names   Hobbies    Names   Hobbies
0  John    Fishing 1 Kathy     Sewing  2 Kurtis  Skiing  3 Sharon  Biking

And if there were more than 4 data points, such as 6 data points, or 50 data points, then the rows would stack.

How can I set the number of columns and then have the data distribute evenly?

Carmoreno · Accepted Answer

You can iterate in the rows of your original dataframe in order to build tiny dataframes and save them in a list. Finally using pd.concat(axis=1) you can concat them to get the expected result.

Edit: According your comment, I have updated the code to create a new row in the dataframe based on max_size_row variable. The idea here is slicing our df_list in chunks according to the length of the row, then create the full row using pd.concat(axis=1) and store it in df_list_row. When We have all rows in df_list_row, We make pd.concat(axis=0) (this is the default value). Maybe there are other best approaches, but I think this is a good answer.

import numpy as np

df_list = []
df_list_row = []
start = 0
max_size_row = end = 4 

for row in panda_table.to_numpy():
  df_list.append(pd.DataFrame(data=[row], columns=panda_table.columns))

while start < len(df_list):
  df_row = df_list[start:end]
  if len(df_row) < max_size_row:
    df_row += [pd.DataFrame(data=[[np.nan, np.nan]], columns=panda_table.columns)]*(max_size_row - len(df_row))
  df_list_row.append(pd.concat(df_row, axis=1))
  start += max_size_row
  end += max_size_row

df = pd.concat(df_list_row).reset_index(drop=True)

Output with max_size_row = 2:

Name	Hobbies	Name	Hobbies
John	Fishing	Kathy	Sewing
Kurtis	Skiing	Sharon	Biking
Carlos	Writing	NaN	NaN

Output with max_size_row = 3:

Name	Hobbies	Name	Hobbies	Name	Hobbies
John	Fishing	Kathy	Sewing	Kurtis	Skiing
Sharon	Biking	Carlos	Writing	NaN	NaN

Output with max_size_row = 4:

Name	Hobbies	Name	Hobbies	Name	Hobbies	Name	Hobbies
John	Fishing	Kathy	Sewing	Kurtis	Skiing	Sharon	Biking
Carlos	Writing	NaN	NaN	NaN	NaN	NaN	NaN

Pandas. Splitting Data frame content continuously and evenly across multiple columns

Answers (1)

Related Questions