geekygeek
geekygeek

Reputation: 741

Pandas. Splitting Data frame content continuously and evenly across multiple columns

As a simple example, suppose I have the following data frame in pandas

the_content = {"Name" : ["John", "Kathy", "Kurtis", "Sharon"], 
              "Hobbies" : ["Fishing", "Sewing", "Skiing", "Biking"]}

When I then create a data frame:

panda_table = pd.DataFrame(the_content)

And then convert it into an html table:

html_panda_table = panda_table.to_html()

The result consists of four data rows and two columns. Something like

    Names    Hobbies
0   John     Fishing
1   Kathy    Sewing
2   Kurtis   Skiing
3   Sharon   Biking

However, I would like to split the same content across repeating rows, evenly. The purpose of this becomes more obvious if there are lots of data. In this case, I would want to set a value, say 4 columns. Then there would be only one data row and 8 columns.

   Names   Hobbies   Names     Hobbies   Names   Hobbies    Names   Hobbies
0  John    Fishing 1 Kathy     Sewing  2 Kurtis  Skiing  3 Sharon  Biking

And if there were more than 4 data points, such as 6 data points, or 50 data points, then the rows would stack.

How can I set the number of columns and then have the data distribute evenly?

Upvotes: 1

Views: 90

Answers (1)

Carmoreno
Carmoreno

Reputation: 1319

You can iterate in the rows of your original dataframe in order to build tiny dataframes and save them in a list. Finally using pd.concat(axis=1) you can concat them to get the expected result.

Edit: According your comment, I have updated the code to create a new row in the dataframe based on max_size_row variable. The idea here is slicing our df_list in chunks according to the length of the row, then create the full row using pd.concat(axis=1) and store it in df_list_row. When We have all rows in df_list_row, We make pd.concat(axis=0) (this is the default value). Maybe there are other best approaches, but I think this is a good answer.

import numpy as np

df_list = []
df_list_row = []
start = 0
max_size_row = end = 4 

for row in panda_table.to_numpy():
  df_list.append(pd.DataFrame(data=[row], columns=panda_table.columns))

while start < len(df_list):
  df_row = df_list[start:end]
  if len(df_row) < max_size_row:
    df_row += [pd.DataFrame(data=[[np.nan, np.nan]], columns=panda_table.columns)]*(max_size_row - len(df_row))
  df_list_row.append(pd.concat(df_row, axis=1))
  start += max_size_row
  end += max_size_row

df = pd.concat(df_list_row).reset_index(drop=True)

Output with max_size_row = 2:

Name Hobbies Name Hobbies
John Fishing Kathy Sewing
Kurtis Skiing Sharon Biking
Carlos Writing NaN NaN

Output with max_size_row = 3:

Name Hobbies Name Hobbies Name Hobbies
John Fishing Kathy Sewing Kurtis Skiing
Sharon Biking Carlos Writing NaN NaN

Output with max_size_row = 4:

Name Hobbies Name Hobbies Name Hobbies Name Hobbies
John Fishing Kathy Sewing Kurtis Skiing Sharon Biking
Carlos Writing NaN NaN NaN NaN NaN NaN

Upvotes: 1

Related Questions