Reputation: 117
Working in python, in a Jupyter notebook. I am given this dataframe
congress chamber state party
80 house TX D
80 house TX D
80 house NJ D
80 house TX R
80 senate KY R
of every congressperson since the 80th congressional term, with a bunch of information. I've narrowed it down to what's needed for this question. I want to alter the dataframe so that I have a single row for every unique combination of congressional term, chamber, state, and party affiliation, Then a new column with the number of rows that are of the associated party divided by the number of rows where everything else besides that is the same. For example, this
congress chamber state party perc
80 house TX D 0.66
80 house NJ D 1
80 house TX R 0.33
80 senate KY R 1
is what I'd want my result to look like. The perc column is the percentage of, for example, democrats elected to congress in TX in the 80th congressional election.
I've tried a few different methods I've found on here, but most of them divide the number of rows by the number of rows in the entire dataframe, rather than by just the rows that meet the 3 given criteria. Here's the latest thing I've tried:
term=80
newdf = pd.crosstab(index=df['party'], columns=df['state']).stack()/len(df[df['congress']==term])
I define term because I'll only care about one term at a time for each dataframe.
A method I tried using groupby involved the following:
newdf = df.groupby(['congress', 'chamber','state']).agg({'party': 'count'})
state_pcts = newdf.groupby('party').apply(lambda x:
100 * x / float(x.sum()))
And it does group by term, chamber, state, but it returns a number that doesn't mean anything to me, when I check what the actual results should be.
Upvotes: 0
Views: 80
Reputation: 21709
Basically, you can do the following using value_counts
for each group:
def func(f):
return f['party'].value_counts(normalize=True)
df = (df
.groupby(['congress','chamber','state'])
.apply(func)
.reset_index()
.rename(columns={'party':'perc','level_3':'party'}))
print(df)
congress chamber state party perc
0 80 house NJ D 1.000000
1 80 house TX D 0.666667
2 80 house TX R 0.333333
3 80 senate KY R 1.000000
Upvotes: 1