Reputation: 128
I have a dataframe of the following form:
Year 1 Grade | Year 2 Grade | Year 3 Grade | Year 4 Grade | Year 1 Students | Year 2 Students | Year 3 Students | Year 4 Students |
---|---|---|---|---|---|---|---|
60 | 70 | 80 | 100 | 20 | 32 | 18 | 25 |
I would like to somehow transpose this table to the following format:
Year | Grade | Students |
---|---|---|
1 | 60 | 20 |
2 | 70 | 32 |
3 | 80 | 18 |
4 | 100 | 25 |
I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.
Upvotes: 4
Views: 281
Reputation: 260420
You need a manual reshaping using a split
of the Index into a MultiIndex:
out = (df
.set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
.iloc[0] # select row as Series
.unstack() # unstack Grade/Students
.droplevel(0) # remove literal "Year"
.rename_axis('Year') # set index name
.reset_index() # index to column
)
output:
Year Grade Students
0 1 60 20
1 2 70 32
2 3 80 18
3 4 100 25
Or using pivot_longer
from janitor
:
# pip install pyjanitor
import janitor
out = (df.pivot_longer(
names_to = ('ignore', 'Year', '.value'),
names_sep = ' ')
.drop(columns='ignore')
)
out
Year Grade Students
0 1 60 20
1 2 70 32
2 3 80 18
3 4 100 25
The .value
determines which parts of the sub labels in the columns are retained; the labels are split apart by names_sep
, which can be a string or a regex. Another option is by using a regex, with names_pattern
to split and reshape the columns:
df.pivot_longer(names_to = ('Year', '.value'),
names_pattern = r'.+(\d)\s(.+)')
Year Grade Students
0 1 60 20
1 2 70 32
2 3 80 18
3 4 100 25
Upvotes: 2
Reputation: 315
Here's one way to do it. Feel free to ask questions about how it works.
import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
"Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})
grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)
Upvotes: 1
Reputation: 3845
I came up with these. grades here are your first row.
df = pd.DataFrame(grades) # your dataframe without columns
grades = np.array(df.iloc[0]).reshape(4,2) # extract the first row, turn it into an array and reshape it to 4*2
new_df = pd.DataFrame(grades).reset_index()
new_df.columns = ['Year', 'Grade', 'Students'] # rename the columns
Upvotes: 0