Reputation: 39
I have a list of tuples in this format:
[("25.00", u"A"), ("44.00", u"X"),("17.00", u"E"),("34.00", u"Y")]
I want to count the number of time we have each letter. I already created a sorted list with all the letter and now I want to count them.
First of all I have a problem with the u before the second item of each tuple, I don't know how to delete it, I guess it's something about enconding.
Here is my code
# coding=utf-8
from collections import Counter
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
groupes = []
students = []
group_of_each_letter = []
number_of_students_per_group = []
final_list = []
def print_a_list(list):
for items in list:
print(items)
for i in df.index:
groupes.append(df['GROUPE'][i])
students.append(df[u'ÉTUDIANT'][i])
groupes = groupes[1:]
students = students[1:]
group_of_each_letter = list(set(groupes))
group_of_each_letter = sorted(group_of_each_letter)
z = zip(students, groupes)
z = list(set(z))
final_list = list(zip(*z))
for j in group_of_each_letter:
number_of_students_per_group.append(final_list.count(j))
print_a_list(number_of_students_per_group)
Group of each letter is a list with the group letters without duplicate.
The problem is that I got the right number of value with the for loop at the end but the list is filled with '0'.
The screenshot below is a sample of the excel file. The column "ETUDIANT" means "Student number" but I cant edit the file, I have to deal with it. GROUPE means GROUP obviously. The goal is to count the number of student per group. I think I'm on the right way even if there is easier ways to do that.
Thanks in advance for your help even if I know that my question is a bit ambiguous
Upvotes: 0
Views: 219
Reputation: 81
Building off of kerwei's answer:
Use groupby() and then nunique()
This will give you the number of unique Student IDs in each Group.
import pandas as pd
df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
# Drop the empty row, which is actually the subheader
df.drop(0, axis=0, inplace=True)
# Now we get a count of unique students by group
student_group = df.groupby('GROUPE')[u'ÉTUDIANT'].nunique()
Upvotes: 2
Reputation: 1842
I think a groupby.count() should be sufficient. It'll count the number of occurrences of your GROUPE letter in the dataframe.
import pandas as pd
df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
# Drop the empty row, which is actually the subheader
df.drop(0, axis=0, inplace=True)
# Now we get a count of students by group
sub_student_group = df.groupby(['GROUPE','ETUDIANT']).count().reset_index()
>>>sub_student_group
GROUPE ETUDIANT
0 B 29
1 L 88
2 N 65
3 O 27
4 O 29
5 O 34
6 O 35
7 O 54
8 O 65
9 O 88
10 O 99
11 O 114
12 O 122
13 O 143
14 O 147
15 U 122
student_group = sub_student_group.groupby('GROUPE').count()
>>>student_group
ETUDIANT
GROUPE
B 1
L 1
N 1
O 12
U 1
Upvotes: 1