Reputation: 31
I am working on an assignment where I have made an dict with Political partys as keys and the genders of the members of the political partys as items.
The dict is named: genderlist
. the code for my dict as following:
soup = BeautifulSoup(open(loadKandidatenlijst()).read(), features="xml")
genderlist = {}
for affiliation in soup.findAll('Affiliation'):
genders = []
party = affiliation.RegisteredName.text
genderlist[party] = 0
for name in affiliation.findAll('Candidate'):
gender = name.Gender.text
genders.append(gender)
genderlist[party] = genders
genderlist['Partij van de Arbeid (P.v.d.A.)'][:6], len(genderlist), len(genderlist['CDA'])
My output results in: (['male', 'female', 'male', 'female', 'male', 'female'], 24, 50)
So, when I insert a partyname it results in the genders of all members in the party.
Now I need to make a dataframe like this:
So where it counts the genders seperatly and returns the femalepercentage in the dataframe.
I've now tried this:
pd.DataFrame(genderlist.items(),columns=['male', 'female'])
How can I make a dataframe like expected, where the first 30 candidates of the party will be counted and result in a male and female separated dataframe with a percentage?
Can you please help me out, what can I do with my code from now on.
Thankyou in advance
Upvotes: 0
Views: 366
Reputation: 140
You can use the list.count(element)
function along with python dictionary comprehension to first create a dictionary of gender_counts
which has the data you need and then use df.from_dict
to convert that into a dataframe
#each list has gender of members of that party
party_A
['female', 'female', 'male', 'female', 'male', 'male', 'female', 'female',
'female', 'female']
gender_dict = {'Party_A': party_A, 'Party_B': party_B,
'Party_C': party_C, 'Party_D': party_D}
gender_counts = {k: [v.count('male'), v.count('female')] for k, v in gender_dict.items()}
gender_counts
{'Party_A': [3, 7],
'Party_B': [5, 9],
'Party_C': [13, 7],
'Party_D': [9, 6]}
df = pd.DataFrame.from_dict(gender_counts, orient='index', columns=['male', 'female'])
df
male female
Party_A 3 7
Party_B 5 9
Party_C 13 7
Party_D 9 6
df['Women_pecentage'] = df.female/(df.male+df.female)
df.round(2)
male female Women_Percentage
Party_A 3 7 0.70
Party_B 5 9 0.64
Party_C 13 7 0.35
Party_D 9 6 0.40
Upvotes: 0
Reputation: 1614
Let df
be your current output (I changed the column names):
df = pd.DataFrame(genderlist.items(), columns=['party_name', 'gender_list'])
gender_list
is now a column of lists in this format:
['male', 'female', 'male', 'female', 'male', 'female']
Now you can just apply unique counts of elements using Counter
, which returns a dictionary and then use apply(pd.Series)
to split the column of dictionaries into separate columns.
from collections import Counter
df['gender_list'].apply(Counter).apply(pd.Series)
Upvotes: 0