Reputation: 1217
I have extracted this data frame from an excel spreadsheet using pandas library, after getting the needed columns and, I have table formatted like this,
REF PLAYERS
0 103368 Andrés Posada Sanmiguel
1 300552 Diego Posada Sanmiguel
2 103304 Roberto Motta Stanziola
3 NaN NaN
4 REF PLAYERS
5 1047012 ANABELLA EISMANN DE AMAYA
6 104701 FERNANDO ENRIQUE AMAYA CASTRO
7 103451 AUGUSTO ANTONIO ALVARADO AZCARRAGA
8 103484 Kevin Adrian Villarreal Kam
9 REF PLAYERS
10 NaN NaN
11 NaN NaN
12 NaN NaN
13 NaN NaN
14 REF PLAYERS
15 NaN NaN
16 NaN NaN
17 NaN NaN
18 NaN NaN
19 REF PLAYERS
I want to create multiple dataframes converting each row [['REF', 'PLAYERS']] to a new dataframe columns. suggestions are welcomed I also need to preserve the blank spaces. A pandas newbie.
Upvotes: 0
Views: 455
Reputation: 365
For this to work, you must first read the dataframe from the file differently: set the argument header=None
in your pd.read_excel()
function. Because now your columns are called "REF" and "PLAYERS", but we would like to group by them.
Then the first column name probably would be "0", and the first line will be as follows, where the df
is the name of your dataframe:
# Set unique index for each group
df["group_id"] = (df[0] == "REF").cumsum()
Solution:
# Set unique index for each group
df["group_id"] = (df["name_of_first_column"] == "REF").cumsum()
# Iterate over groups
dataframes = []
for name, group in df.groupby("group_id"):
df_ = group
# promote 1st row to column name
df_.columns = df_.iloc[0]
# and drop it
df_ = df_.iloc[1:]
# drop index column
df_ = df_[["REF", "PLAYERS"]]
# append to the list of dataframes
dataframes.append(df_)
All your multiple dataframes are now stored in an array dataframes
.
Upvotes: 1
Reputation: 13821
You can split your dataframe
, into equal lengths (in your case 4 rows for each df
), using np.split
.
Since you want 4 rows per dataframe, you can split it into 5 different df
:
import numpy as np
dfs = [df.loc[idx] for idx in np.split(df.index,5)]
And then create your individual dataframes:
df1 = dfs[1]
df1
REF PLAYERS
4 REF PLAYERS
5 1047012 ANABELLA EISMANN DE AMAYA
6 104701 FERNANDO ENRIQUE AMAYA CASTRO
7 103451 AUGUSTO ANTONIO ALVARADO AZCARRAGA
df2 = dfs[2]
df2
REF PLAYERS
8 103484 Kevin Adrian Villarreal Kam
9 REF PLAYERS
10 NaN NaN
11 NaN NaN
Upvotes: 1