Reputation: 43
I'm new to pandas and I want to be able to get number of instances for each person and feed it into a another Dataframe as a column. I've removed the NaN values from the dataframe before I made the group by the user column
I've tried this but it doesn't seem to work
DF["NumInstances"] = userGrp["user"].value_counts()
I've look over the internet, but can't seem to find a solution, please help.
Edit: Sample Data and Expected Outcome
[{"user" : "4",
"Instance": "21"},
{"user" : "4",
"Instance": "6"},
{"user" : "5",
"Instance" : "546453"}]
Expected outcome:
DataFrame =
[{"user":"4",
"NumInstances" : "2"},
{"user":"5",
"NumInstances" : "1"}]
So basically counts how many times the instance occurs for each user across data entries.
Upvotes: 2
Views: 353
Reputation: 34056
Based on your sample input, you can do this:
In [2535]: df = pd.DataFrame([{"user" : "4",
...: "Instance": "21"},
...: {"user" : "4",
...: "Instance": "6"},
...: {"user" : "5",
...: "Instance" : "546453"}])
In [2539]: df.groupby('user', as_index=False).count()
Out[2539]:
user Instance
0 4 2
1 5 1
Upvotes: 2
Reputation: 986
I used the following solution that will create a new dataframe which contains both column named "user" and "NumInstances" :
df_counts = df.groupby(['user']).size().reset_index(name='NumInstances')
Hope it helps.
Upvotes: 0
Reputation: 1112
if DF is the name of your dataset and "user" the name of the column you want to groupby for, then try:
count = DF.groupby("user").count()
print(count)
Upvotes: 0