Reputation: 1995
I have a Dataframe with distinct values of Atr1 and that has some other attributes, and I want to generate a dictionary from it, considering the key of the dictionary each of the values of the Atr1 (unique values, as I told before), and the values of the dict the values of the Atr2.
Here is the Dataframe:
+------+------+------+------+
| Atr1 | Atr2 | Atr3 | Atr4 |
+------+------+------+------+
| 'C' | 'B' | 21 | 'H' |
+------+------+------+------+
| 'D' | 'C' | 21 | 'J' |
+------+------+------+------+
| 'E' | 'B' | 21 | 'K' |
+------+------+------+------+
| 'A' | 'D' | 24 | 'I' |
+------+------+------+------+
I want to get a Dictionary like this:
Dict -> {'C': 'B', 'D': 'C', 'E': 'B', 'A': 'D'}
How could I do it?
Upvotes: 1
Views: 5557
Reputation: 210972
Pandas solution:
df.select('attr1','attr2').toPandas().set_index('Atr1')['Atr2'].to_dict()
NOTE: @mtoto's solution is much more elegant, faster and needs less resources...
Upvotes: 0
Reputation: 24198
You can just use a simple collectAsMap()
:
df.select("Atr1", "Atr2").rdd.collectAsMap()
Upvotes: 9
Reputation: 27889
You can use something like this:
attr1 = df.select('attr1').rdd.flatMap(lambda x: x).collect()
attr2 = df.select('attr2').rdd.flatMap(lambda x: x).collect()
result = {k: v for k, v in zip(attr1, attr2)}
Upvotes: 1
Reputation: 56
What about using df.to_dict()?
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
import pandas as pd
df = pd.DataFrame({'A1':['C','D','E', 'A'], 'A2':['B','C','B','C']})
A1 A2
0 C B
1 D C
2 E B
3 A D
df = df.set_index('A1')
dict = df.to_dict()['A2']
results in
dict = {'C': 'B', 'A': 'D', 'D': 'C', 'E': 'B'}
Upvotes: 0