Toms lns
Toms lns

Reputation: 39

Delete first item of every tuples in a list in Python

I have a list of tuples in this format:

[("25.00", u"A"), ("44.00", u"X"),("17.00", u"E"),("34.00", u"Y")]

I want to count the number of time we have each letter. I already created a sorted list with all the letter and now I want to count them.

First of all I have a problem with the u before the second item of each tuple, I don't know how to delete it, I guess it's something about enconding.

Here is my code

# coding=utf-8
from collections import Counter 
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
groupes = [] 
students = [] 
group_of_each_letter = [] 
number_of_students_per_group = []
final_list = []

def print_a_list(list):
    for items in list:
        print(items)


for i in df.index:
    groupes.append(df['GROUPE'][i]) 
    students.append(df[u'ÉTUDIANT'][i]) 

groupes = groupes[1:] 
students = students[1:] 

group_of_each_letter = list(set(groupes)) 
group_of_each_letter = sorted(group_of_each_letter) 

z = zip(students, groupes) 
z = list(set(z)) 

final_list = list(zip(*z)) 

for j in group_of_each_letter:
    number_of_students_per_group.append(final_list.count(j))

print_a_list(number_of_students_per_group)

Group of each letter is a list with the group letters without duplicate.

The problem is that I got the right number of value with the for loop at the end but the list is filled with '0'.

The screenshot below is a sample of the excel file. The column "ETUDIANT" means "Student number" but I cant edit the file, I have to deal with it. GROUPE means GROUP obviously. The goal is to count the number of student per group. I think I'm on the right way even if there is easier ways to do that.

enter image description here

Thanks in advance for your help even if I know that my question is a bit ambiguous

Upvotes: 0

Views: 219

Answers (2)

Sam
Sam

Reputation: 81

Building off of kerwei's answer:

Use groupby() and then nunique()

This will give you the number of unique Student IDs in each Group.

import pandas as pd

df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
# Drop the empty row, which is actually the subheader
df.drop(0, axis=0, inplace=True)
# Now we get a count of unique students by group
student_group = df.groupby('GROUPE')[u'ÉTUDIANT'].nunique()

Upvotes: 2

kerwei
kerwei

Reputation: 1842

I think a groupby.count() should be sufficient. It'll count the number of occurrences of your GROUPE letter in the dataframe.

import pandas as pd

df = pd.read_excel('test.xlsx', sheet_name='Essais', skiprows=1)
# Drop the empty row, which is actually the subheader
df.drop(0, axis=0, inplace=True)
# Now we get a count of students by group
sub_student_group = df.groupby(['GROUPE','ETUDIANT']).count().reset_index()

>>>sub_student_group
   GROUPE  ETUDIANT
0       B        29
1       L        88
2       N        65
3       O        27
4       O        29
5       O        34
6       O        35
7       O        54
8       O        65
9       O        88
10      O        99
11      O       114
12      O       122
13      O       143
14      O       147
15      U       122

student_group = sub_student_group.groupby('GROUPE').count()

>>>student_group
        ETUDIANT
GROUPE
B              1
L              1
N              1
O             12
U              1

Upvotes: 1

Related Questions