Reputation: 2496
I have a Pandas dataframe for which I would like to return the number of unique values in each column, except that some columns should be excluded.
This is how I am used to selecting unique values in a column, but I am not sure how to iterate it:
pd.unique(df.column_name.ravel())
My mind goes to something like this, but it obviously is not valid.
col_names = list(df.columns.values)
dont_include = ['foo', 'bar']
cols_to_include = [x for x in col_names if x not in dont_include]
for i in cols_to_include:
col_unique_count = len(pd.unique(df.i.ravel())
What is the best solution?
Upvotes: 1
Views: 1170
Reputation: 393983
Code can be simplified to this:
cols_to_include = df.columns[~df.columns.str.contains('foo')]
for col in cols_to_include:
col_unique_count = df[col].nunique()
You can call nunique
to get the count of unique values for a given Series
Or:
cols_to_include = df.columns[~df.columns.str.contains('foo')]
df[cols_to_include].apply(pd.Series.nunique)
here apply
will call nunique
on each column
EDIT
Use isin
to test for membership and ~
to negate the boolean mask:
In [47]:
df = pd.DataFrame(columns = ['foo','baz','bar','pie'])
df
Out[47]:
Empty DataFrame
Columns: [foo, baz, bar, pie]
Index: []
In [48]:
dont_include = ['foo', 'bar']
cols = df.columns[~df.columns.isin(dont_include)]
cols
Out[48]:
Index(['baz', 'pie'], dtype='object')
You can then use my code as before to iterate over the sub-selection of your df
Upvotes: 3