Reputation: 115
I have a dictionary where keys are patient ids, and values are same for all keys: [1, 2, 3], indicating each patient will visit the clinic 3 times. How can I convert it to a dataframe where both the keys and the values are rows?
Dictionary:
patients = ['Patient01', 'patient02', 'patient03']
visits = [1,2,3]
dictionary = {k:visits for k in patients}
output:
{'Patient01': [1, 2, 3],
'patient02': [1, 2, 3],
'patient03': [1, 2, 3]}
I tried
pd.DataFrame.from_dict(dictionary, orient = 'index')
but the output is
0 1 2
patient02 1 2 3
patient03 1 2 3
patient01 1 2 3
and what I want is like this:
visit_num
patient01 1
patient01 2
patient01 3
patient02 1
patient02 2
patient02 3
patient03 1
patient03 2
patient03 3
Upvotes: 5
Views: 8078
Reputation: 323366
Maybe you can try with numpy
pd.DataFrame({'visit_num':np.hstack(list(dictionary.values()))},index=np.repeat(list(dictionary.keys()),len(dictionary)))
Out[76]:
visit_num
Patient01 1
Patient01 2
Patient01 3
patient02 1
patient02 2
patient02 3
patient03 1
patient03 2
patient03 3
Upvotes: 2
Reputation: 51395
Use pd.stack()
on the dataframe you created:
df = pd.DataFrame.from_dict(dictionary, orient = 'index')
new_df = df.stack().reset_index(level=1, drop=True).to_frame(name='visit_num')
>>> new_df
visit num
Patient01 1
Patient01 2
Patient01 3
patient02 1
patient02 2
patient02 3
patient03 1
patient03 2
patient03 3
Note of explanation:
df.stack
does most of the work here, taking your original df
0 1 2
Patient01 1 2 3
patient02 1 2 3
patient03 1 2 3
and turns it into the following multi-indexed pandas.Series
:
Patient01 0 1
1 2
2 3
patient02 0 1
1 2
2 3
patient03 0 1
1 2
2 3
The rest of the line (.reset_index()
and .to_frame()
) is simply there to get it into a nice dataframe format.
Upvotes: 7
Reputation: 294508
Straight from a comprehension
pd.Series(
*zip(*((v, k) for k, c in dictionary.items() for v in c))
).to_frame('visit_num')
visit_num
Patient01 1
Patient01 2
Patient01 3
patient02 1
patient02 2
patient02 3
patient03 1
patient03 2
patient03 3
Upvotes: 4
Reputation: 36043
data = [[patient, visit_num]
for patient, visits in dictionary.items()
for visit_num in visits]
df = pd.DataFrame(data, columns=['patient', 'visit_num']).set_index('patient')
Upvotes: 3
Reputation: 153510
Use melt
:
df = pd.DataFrame.from_dict(dictionary, orient = 'index')
df.reset_index()\
.melt('index',value_name='visit_num')\
.drop('variable', axis=1)\
.sort_values('index') #if you wish to get your order
Output:
index visit_num
1 Patient01 1
4 Patient01 2
7 Patient01 3
2 patient02 1
5 patient02 2
8 patient02 3
0 patient03 1
3 patient03 2
6 patient03 3
Upvotes: 4
Reputation: 164783
You can use itertools.product
to simply your problem, followed by pd.DataFrame.set_index
.
import pandas as pd
from itertools import product
patients = ['Patient01', 'patient02', 'patient03']
visits = [1, 2, 3]
df = pd.DataFrame(list(product(patients, visits)), columns=['patients', 'visit_num'])\
.set_index('patients')
Upvotes: 2