Reputation: 129
distinct_values = df.col_name.unique().compute()
But what if I don't know the names of columns.
Upvotes: 2
Views: 134
Reputation: 2643
You can try this,
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 5]})
>>> d = dict()
>>> d['any_column_name'] = pd.unique(df.values.ravel('K'))
>>> d
{'any_column_name': array([1, 2, 3, 5])}
or for just one feature,
>>> d = dict()
>>> d['a'] = df['a'].unique()
>>> d
{'a': array([1, 2, 3])}
or individually for all,
>>> d = dict()
>>> for col in df.columns:
... d[col] = df[col].unique()
...
>>> d
{'a': array([1, 2, 3]), 'b': array([2, 3, 5])}
Upvotes: 1
Reputation: 13401
I think you need:
df = pd.DataFrame({"colA":['a', 'b', 'b', 'd', 'e'], "colB":[1,2,1,2,1]})
unique_dict = {}
# df.columns will give you list of columns in dataframe
for col in df.columns:
unique_dict[col] = list(df[col].unique())
Output:
{'colA': ['a', 'b', 'd', 'e'], 'colB': [1, 2]}
Upvotes: 1