Reputation: 13
I have a csv file with 3 columns. users, text and labels. each user has multiple texts and labels. i want to know the label with the highest frequency of occurrence in order to determine the category of each user.
I have tried:
for i in df['user'].unique():
print (df['class'].value_counts())
which gives returns the same values shown below for all users
4 3062
1 1250
0 393
3 281
2 13
Name: class, dtype: int64
I also tried
for h in df['user'].unique():
g = Counter(df['class'])
print (g)
and got
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
Counter({4: 3062, 1: 1250, 0: 393, 3: 281, 2: 13})
here is the sample data sample data Please Help
Upvotes: 1
Views: 658
Reputation: 164783
For counting values by group, you can use groupby
with pd.value_counts
:
df = pd.DataFrame([[1, 1], [1, 2], [1, 3], [1, 1], [1, 1], [1, 2],
[2, 1], [2, 3], [2, 2], [2, 2], [2, 3], [2, 3]],
columns=['user', 'class'])
res = df.groupby('user')['class'].apply(pd.value_counts).reset_index()
res.columns = ['user', 'class', 'count']
print(res)
user class count
0 1 1 3
1 1 2 2
2 1 3 1
3 2 3 3
4 2 2 2
5 2 1 1
Upvotes: 1