Reputation: 39
I am looking for a solution for the following problem. I have an Excel file that I read with Pandas. Column A contains an identifier and column B also contains Identiers, but the identifiers in column B are linked to the identifier in Column A. example:
+----------+----------+
| Column A | Column B |
+----------+----------+
| ID1 | ID5 |
+----------+----------+
| ID1 | ID6 |
+----------+----------+
| ID1 | ID7 |
+----------+----------+
| ID2 | ID8 |
+----------+----------+
| ID2 | ID9 |
+----------+----------+
| ID3 | ID8 |
+----------+----------+
| ID3 | ID9 |
+----------+----------+
| ID3 | ID10 |
+----------+----------+
| ID3 | ID11 |
+----------+----------+
So now I want to have ID1 linked to ID5, ID6 and ID7. ID2 linked to ID8, ID9 etc
In Java I would use the linked list. What should I use in Python?
Thanks!
Upvotes: 1
Views: 608
Reputation:
Since you are using pandas to read, you can construct the dictionary with pandas methods:
df.groupby('Column A')['Column B'].agg(lambda x: list(x)).to_dict()
Out[42]:
{'ID1': ['ID5', 'ID6', 'ID7'],
'ID2': ['ID8', 'ID9'],
'ID3': ['ID8', 'ID9', 'ID10', 'ID11']}
Upvotes: 2
Reputation: 43
Maybe a defaultdict with set?
from collections import defaultdict
ident_pair_dict = defaultdict(set)
for ind, row in df.iterrows(): # df is the pandas dataframe you read
ident_pair_dict[row['Column A']].add(row['Column B'])
Upvotes: 2