Reputation: 199
I have a dataframe that looks like this:
Curricula Course1 Course2 Course3 ... CourseN
0 q1 c1 c2 NaN NaN
1 q2 c14 c21 c1 Nan
2 q3 c2 c14 NaN Nan
...
M qm c7 c9 c21
Where the number of Courses per Curricula is different.
What I need is a dictionary from this dataframe looking like this:
{'q1': 'c1', 'q1': 'c2', 'q2': 'c14', 'q2': 'c21', 'q2: 'c1' ... }
Where the row names are my keys and for each row, the dictionary is filled with all the 'Curricula': 'Course' information that is given, excluding 'NaN' values.
What i tried so far was set the index to the 'Curricula' column, transposing the dataframe and using the to_dict('records') methods but this resulted in the following output:
in:
df.set_index('Curricula')
df_transposed = df.transpose()
Dic = df_transposed.to_dict('records')
out:
[{0: 'q1', 1: 'q2', 2: 'q3', ... }, {0: 'c1', 1: 'c14', 2: 'c2' ...} ... {0: NaN, 1: 'c1', 2: 'Nan']
So here the columns integer values are used as keys instead of my wanted 'Curricula' column values and additionally, the NaN values are not excluded.
Anyone an idea how to fix that?
Best regards, Jan
Upvotes: 1
Views: 1693
Reputation: 51165
Setup
df = pd.DataFrame({'Curricula': {0: 'q1', 1: 'q2', 2: 'q3'},
'Course1': {0: 'c1', 1: 'c14', 2: 'c2'},
'Course2': {0: 'c2', 1: 'c21', 2: 'c14'},
'Course3': {0: np.nan, 1: 'c1', 2: np.nan}})
print(df)
Curricula Course1 Course2 Course3
0 q1 c1 c2 NaN
1 q2 c14 c21 c1
2 q3 c2 c14 NaN
You can't have duplicate keys in a dictionary, however you can use agg
along with set_index
and stack
to create a list for each unique key:
df.set_index('Curricula').stack().groupby(level=0).agg(list).to_dict()
{'q1': ['c1', 'c2'], 'q2': ['c14', 'c21', 'c1'], 'q3': ['c2', 'c14']}
Upvotes: 1