Reputation: 1460
I have two columns like shown below, and trying to return the highest count of the second column, but its just returning me the highest count on rating without considering the gender
DATA :
print (df)
AGE GENDER rating
0 10 M PG
1 10 M R
2 10 M R
3 4 F PG13
4 4 F PG13
CODE :
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index(level=2)['c'])
OUTPUT :
print (s[F])
('PG')
print(s[M]
('PG', 'R')
Upvotes: 1
Views: 971
Reputation: 862921
I think you need for counts with categories and ratings use groupby
+ value_counts
+ head
:
df1 = (df.groupby('gender')['rating']
.apply(lambda x: x.value_counts().head(1))
.rename_axis(('gender','rating'))
.reset_index(name='val'))
print (df1)
gender rating val
0 F PG13 2
1 M R 2
If want only top ratings seelct first value of index per group:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])
print (s)
gender
F PG13
M R
Name: rating, dtype: object
print (s['M'])
R
print (s['F'])
PG13
Or only top counts select first value of Series
per group:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().iat[0])
print (s)
gender
F 2
M 2
Name: rating, dtype: int64
print (s['M'])
2
print (s['F'])
2
EDIT:
s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])
def gen_mpaa(gender):
return s[gender]
print (gen_mpaa('M'))
print (gen_mpaa('F'))
EDIT:
Solution if genre id
values are strings:
print (type(df.loc[0, 'genre id']))
<class 'str'>
df = df.set_index('gender')['genre id'].str.split(',', expand=True).stack()
print (df)
gender
M 0 11
1 22
2 33
0 22
1 44
2 55
0 33
1 44
2 55
F 0 11
1 22
0 22
1 55
0 55
1 44
dtype: object
d = df.groupby(level=0).apply(lambda x: x.value_counts().index[0]).to_dict()
print (d)
{'M': '55', 'F': '55'}
EDIT1:
print (df)
AGE GENDER rating
0 10 M PG
1 10 M R
2 10 M R
3 4 F PG13
4 4 F PG13
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index(level=2)['c'])
print (s)
a b
4 F PG13
10 M R
M PG
Name: c, dtype: object
Upvotes: 1
Reputation: 44525
Here is a standard library solution for this file:
%%file "test.txt"
gender rating
M PG
M R
F NR
M R
F PG13
F PG13
Given
import collections as ct
def read_file(fname):
with open(fname, "r") as f:
header = next(f)
for line in f:
gender, rating = line.strip().split()
yield gender, rating
Code
filename = "test.txt"
dd = ct.defaultdict(ct.Counter)
for k, v in sorted(read_file(filename), key=lambda x: x[0]):
dd[k][v] += 1
{k: v.most_common(1) for k, v in dd.items()}
# {'F': [('PG13', 2)], 'M': [('R', 2)]}
Details
Each line of the file is parse and added to a defaultdict
. The keys are genders, but the values are Counter
objects for each rating per gender. Counter.most_common()
is called to retrieve the top occurrences.
Since the data is grouped by gender, you can explore more information. For example, unique ratings of each gender:
{k: set(v.elements()) for k, v in dd.items()}
# {'F': {'NR', 'PG13'}, 'M': {'PG', 'R'}}
Upvotes: 2