user3306199
user3306199

Reputation:

How to get data from several rows into different columns of a new dataframe

I'm trying to figure out how to get data from several rows into different columns of a new dataframe. The new dataframe consists of unique ID's of the ord dataframe.

df_1 = pd.DataFrame({'course_id': [1, 1, 2, 2, 3, 3, 4, 4, 4],
                     'exam_id': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B'],
                     'exam_type': ['WRITTEN', 'WRITTEN', 'HOMEEXAM', 'HOMEEXAM', 'ORAL', 
                                   'WRITTEN', 'WRITTEN', 'HOMEEXAM', 'HOMEEXAM'],
                     'semester': ['2019_1', '2019_2', '2019_1', '2019_2', '2019_1', '2019_2', 
                                  '2019_1', '2019_2', '2019_1']}
                   )

df_2 = pd.DataFrame({'course_id': [1, 1, 2, 3, 4, 4],
                     'exam_id': ['A', 'B', 'A', 'B', 'A', 'B'],
                     '2019_1': ['WRITTEN', 'NaN', 'HOMEEXAM', 'ORAL', 'WRITTEN', 'HOMEEXAM'],
                     '2019_2': ['NaN', 'WRITTEN', 'HOMEEXAM', 'WRITTEN', 'HOMEEXAM', 'NaN']}
                   )

enter image description here

Any ideas on how to get from df_1 to the desired df_2?

Upvotes: 1

Views: 48

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35646

Try DataFrame.pivot:

import pandas as pd

df_1 = pd.DataFrame({'course_id': [1, 1, 2, 2,
                                   3, 3, 4, 4, 4],
                     'exam_id': ['A', 'B', 'A',
                                 'A', 'B', 'B',
                                 'A', 'A', 'B'],
                     'exam_type': ['WRITTEN', 'WRIITEN', 'HOMEEXAM',
                                   'HOMEEXAM', 'ORAL', 'WRITTEN',
                                   'WRITTEN', 'HOMEEXAM', 'HOMEEXAM'],
                     'semester': ['2019_1', '2019_2', '2019_1',
                                  '2019_2', '2019_1', '2019_2', '2019_1',
                                  '2019_2', '2019_1']})

df_2 = df_1.pivot(index=['course_id', 'exam_id'],
                  columns='semester',
                  values='exam_type') \
    .reset_index() \
    .rename_axis(None, axis=1)

# For Display
print(df_2.to_string(index=False))

Output:

course_id exam_id   2019_1   2019_2
        1       A  WRITTEN      NaN
        1       B      NaN  WRIITEN
        2       A HOMEEXAM HOMEEXAM
        3       B     ORAL  WRITTEN
        4       A  WRITTEN HOMEEXAM
        4       B HOMEEXAM      NaN

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195438

x = (
    df_1.pivot_table(
        index=["course_id", "exam_id"],
        columns="semester",
        values="exam_type",
        aggfunc="first",
    )
    .reset_index()
    .rename_axis("", axis=1)
)
print(x)

Prints:

   course_id exam_id    2019_1    2019_2
0          1       A   WRITTEN       NaN
1          1       B       NaN   WRITTEN
2          2       A  HOMEEXAM  HOMEEXAM
3          3       B      ORAL   WRITTEN
4          4       A   WRITTEN  HOMEEXAM
5          4       B  HOMEEXAM       NaN

Upvotes: 2

Related Questions