Reputation:
I'm trying to figure out how to get data from several rows into different columns of a new dataframe. The new dataframe consists of unique ID's of the ord dataframe.
df_1 = pd.DataFrame({'course_id': [1, 1, 2, 2, 3, 3, 4, 4, 4],
'exam_id': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B'],
'exam_type': ['WRITTEN', 'WRITTEN', 'HOMEEXAM', 'HOMEEXAM', 'ORAL',
'WRITTEN', 'WRITTEN', 'HOMEEXAM', 'HOMEEXAM'],
'semester': ['2019_1', '2019_2', '2019_1', '2019_2', '2019_1', '2019_2',
'2019_1', '2019_2', '2019_1']}
)
df_2 = pd.DataFrame({'course_id': [1, 1, 2, 3, 4, 4],
'exam_id': ['A', 'B', 'A', 'B', 'A', 'B'],
'2019_1': ['WRITTEN', 'NaN', 'HOMEEXAM', 'ORAL', 'WRITTEN', 'HOMEEXAM'],
'2019_2': ['NaN', 'WRITTEN', 'HOMEEXAM', 'WRITTEN', 'HOMEEXAM', 'NaN']}
)
Any ideas on how to get from df_1
to the desired df_2
?
Upvotes: 1
Views: 48
Reputation: 35646
Try DataFrame.pivot
:
import pandas as pd
df_1 = pd.DataFrame({'course_id': [1, 1, 2, 2,
3, 3, 4, 4, 4],
'exam_id': ['A', 'B', 'A',
'A', 'B', 'B',
'A', 'A', 'B'],
'exam_type': ['WRITTEN', 'WRIITEN', 'HOMEEXAM',
'HOMEEXAM', 'ORAL', 'WRITTEN',
'WRITTEN', 'HOMEEXAM', 'HOMEEXAM'],
'semester': ['2019_1', '2019_2', '2019_1',
'2019_2', '2019_1', '2019_2', '2019_1',
'2019_2', '2019_1']})
df_2 = df_1.pivot(index=['course_id', 'exam_id'],
columns='semester',
values='exam_type') \
.reset_index() \
.rename_axis(None, axis=1)
# For Display
print(df_2.to_string(index=False))
Output:
course_id exam_id 2019_1 2019_2
1 A WRITTEN NaN
1 B NaN WRIITEN
2 A HOMEEXAM HOMEEXAM
3 B ORAL WRITTEN
4 A WRITTEN HOMEEXAM
4 B HOMEEXAM NaN
Upvotes: 0
Reputation: 195438
x = (
df_1.pivot_table(
index=["course_id", "exam_id"],
columns="semester",
values="exam_type",
aggfunc="first",
)
.reset_index()
.rename_axis("", axis=1)
)
print(x)
Prints:
course_id exam_id 2019_1 2019_2
0 1 A WRITTEN NaN
1 1 B NaN WRITTEN
2 2 A HOMEEXAM HOMEEXAM
3 3 B ORAL WRITTEN
4 4 A WRITTEN HOMEEXAM
5 4 B HOMEEXAM NaN
Upvotes: 2