Reputation: 37
I am trying to import two files and compare value counts in df1 (data by state) and a number in a row for that state in df2.
In other words, in one Excel file I have something that looks like this:
State Food
Arizona Bananas
Arizona Pears
Arizona Pickles
Connecticut Potatoes
Connecticut Apples
Etc.
So from there I am interested in how many times the state appears, the value count.
In another file I have a column of the 50 states and a number.
What I am trying to do is basically create a dataframe that displays, by state, the number of times said state appears in df1 (so here Arizona would be 3) divided by the number in the column corresponding to Arizona in the second data frame. Does that make sense?
The second dataframe contains a total population for each state, so the output of 3/n above would be fruit per capita.
Upvotes: 2
Views: 129
Reputation: 409
The following will work
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'state': ['Arizona', 'Arizona', 'Arizona',
'Connecticut', 'Connecticut'],
'food': ['Bananas', 'Pears', 'Pickles', 'Potatoes', 'Apples']})
df2 = pd.DataFrame({'state':['Arizona', 'Connecticut'],
'population': [7300000, 3565000 ]})
df1 = df1.groupby('state').count().merge(df2.set_index('state'),
how = 'left', left_index = True, right_index = True)
df1['result'] = df1['food']/df1['population']
df1
food population result
state
Arizona 3 7300000 4.109589e-07
Connecticut 2 3565000 5.610098e-07
Upvotes: 2