Reputation: 53
Suppose I have a data frame (sorted by column1) df:
column1 column2
1 'Yes'
1 'No'
2 'Yes'
3 'Yes'
3 'Yes'
3 'Yes'
For every value in column 1, I want to know its possible values in column2, and store these key-value pair in a structure.
For example, given the above df, I want a structure like
{1: ['yes','No']; 2: ['Yes'], 3:['Yes']}
I did something like
for key in df[column1].unique.tolist():
values = df.loc[df[column1] == key]][column2].unique().tolist()
dict.add((key, values))
However, as my original df is really big, this method takes too long. As the df is sorted by column1, I wonder if there is a more efficient method to achieve this.
Thank you!
Upvotes: 1
Views: 197
Reputation: 6091
remove redundant data with duplicates
, then groupby
by one column, take unique values, and save to dict:
final_dict = df.drop_duplicates().groupby('column1')['column2'].unique().to_dict()
Upvotes: 3
Reputation: 3739
First drop duplicates then apply to_dict()
df.drop_duplicates(subset=['col1','col2'],inplace=True)
df.to_dict()
Upvotes: 1
Reputation: 446
I would look at the df.to_dict() method.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html
You should be able to get something like what you want. The example on the documentation site is as such:
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
Looks like you updated since I posted, check this out with the 'list' arguement.
‘list’ : dict like {column -> [values]}
Upvotes: 1