Reputation: 171
I'm trying to create a dictionary from a pandas data frame with key from a column and values from rest of columns. But the problem is I will have same keys in multiple rows and I've read through many other similar SO posts but couldn't find the answer. This is what I have:
df1:
pid feature_id feature_value
78 20 1.0
78 1130 3.0
...
91 1148 1.0
92 1153 4.0
92 1154 1.0
...
115 1162 1.0
115 1175 5.0
......
This is what I tried:
df2 = df1.set_index('pid').agg(tuple, 1).to_dict()
But problem is this seems to not take into account the same keys from multiple rows.
What I want is something like this:
{78: [(20, 1.0), (1130, 3.0)]..., 115: [(1162, 1.0), (1175, 5.0)], ...}
Please advise.
Upvotes: 1
Views: 1522
Reputation: 7912
Take the example dataframe:
df = pd.DataFrame({'col0':[1,2,3,1,2],'col1':[10,20,30,40,50],'col2':[7,8,9,8,7]})
You can do:
df = df.assign(pairs = df.apply(lambda row: [row['col1'],row['col2']],axis=1))
res = df.groupby('col0')['pairs'].apply(list).to_dict()
Original df
:
col0 col1 col2
0 1 10 7
1 2 20 8
2 3 30 9
3 1 40 8
4 2 50 7
res
:
{1: [[10, 7], [40, 8]], 2: [[20, 8], [50, 7]], 3: [[30, 9]]}
Same is applicable to your df
, just replace col0
, col1
and col2
by pid
, feature_id
and feature_value
respectively.
Upvotes: 1
Reputation: 1144
def df_to_dict(df):
# create a dictionary
d = {}
# iterate over the rows
for index, row in df.iterrows():
# if the key is not in the dictionary, add it
if row[0] not in d:
d[int(row[0])] = []
# add the tuple (row[1], row[2]) to the list associated with the key
d[row[0]].append((row[1], row[2]))
return d
print(df_to_dict(df))
Upvotes: 0