Reputation: 1563
I'm new to Python and trying to get my head around how to manipulate Pandas dataframes. I'm using the winemag-data-130k-v2.csv dataset. The fields of interest are 'country','province','winery'variety'.
The first thing I'd like to do is determine the number of wineries per province.
I can get as far as
reviews_df.groupby(['country','province']).size()
But this gives me the number of rows. (So, 3 if a winery produces 3 varieties).
But I want something like a count(distinct winery)
in SQL.
Suggestions?
Upvotes: 0
Views: 26