Reputation: 3121
I want to create a pandas dataframe with two columns, the first being the unique values of one of my columns and the second being the count of unique values.
I have seen many posts (such here) as that describe how to get the counts, but the issue I'm running into is when I try to create a dataframe the column values become my index.
Sample data: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})
. I want to end up with a dataframe like:
Color Count
0 Red 2
1 Blue 1
I have tried the following, but in all cases the index ends up as Color and the Count is the only column in the dataframe.
Attempt 1:
df2 = pd.DataFrame(data=df['Color'].value_counts())
# And resetting the index just gets rid of Color, which I want to keep
df2 = df2.reset_index(drop=True)
Attempt 2:
df3 = df['Color'].value_counts()
df3 = pd.DataFrame(data=df3, index=range(df3.shape[0]))
Attempt 3:
df4 = df.groupby('Color')
df4 = pd.DataFrame(df4['Color'].count())
Upvotes: 11
Views: 15744
Reputation: 2127
df = df.groupby('Color').count().reset_index()
df.columns = ['Color','Count']
Upvotes: 0
Reputation: 25662
Another way to do this, using value_counts
:
In [10]: df = pd.DataFrame({'Color': ['Red', 'Red', 'Blue'], 'State': ['MA', 'PA', 'PA']})
In [11]: df.Color.value_counts().reset_index().rename(
columns={'index': 'Color', 0: 'count'})
Out[11]:
Color count
0 Red 2
1 Blue 1
Upvotes: 15
Reputation: 164663
One readable solution is to use to_frame
and rename_axis
methods:
res = df['Color'].value_counts()\
.to_frame('count').rename_axis('Color')\
.reset_index()
print(res)
Color count
0 Red 2
1 Blue 1
Upvotes: 3
Reputation: 28683
Essentially equivalent to setting the column names, but using the rename method instead:
df.groupby('Color').count().reset_index().rename(columns={'State': 'Count'})
Upvotes: 2