Reputation: 105
I have this dataframe:
iata airport city state country lat \
0 00M Thigpen Bay Springs MS USA 31.953765
1 00R Livingston Municipal Livingston TX USA 30.685861
2 00V Meadow Lake Colorado Springs CO USA 38.945749
3 01G Perry-Warsaw Perry NY USA 42.741347
4 01J Hilliard Airpark Hilliard FL USA 30.688012
I am trying to get the number of airports per state. For example if I have the function:
f(dataframe, state):
result reuslt
Where state would be a state abbreviation, such as 'MA'. I am trying to group the dataframe by the input variable, such as state ('MA') to then get the number of airports per state.
When I use:
df.groupby(state)['airport'].value_counts()
or
df.groupby(state)['airport'].value_counts()/df['airport'].count()
df.groupby(['state'] == state)['airport'].value_counts()/df['airport'].count()
The last two are regarding the conditional probability a selected airport will be in that state.
It throws a Key Error: 'MA', which I think is due to the input variable not being recognized as a column, but a value in the column.
Is there a way to get the number of airports per state?
Upvotes: 2
Views: 1320
Reputation: 105
This seemed to work the way I intended with all your help. a[state]
represents an input in the form of a state abbreviation ('MA'). This returns the probability of a randomly selected airport belonging to that state.
a = df.groupby('state').iata.nunique()
s = a.sum()
result = a[state]/s
return result
Upvotes: 0
Reputation: 904
I would use Pandas's nunique
to get the number of airports per state. The code is easier to read and remember.
To illustrate my point, I modified the dataset as follows, such that Florida has three more fictional airports:
iata airport city state country lat
0 00M Thigpen Bay Springs MS USA 31.953765
1 00R Livingston Municipal Livingston TX USA 30.685861
2 00V Meadow Lake Springs CO USA 38.945749
3 01G Perry-Warsaw Perry NY USA 42.741347
4 01J Hilliard Airpark Hilliard FL USA 30.688012
5 f234 Weirdviller Chilliard FL USA 30.788012
6 23r2 Johnson Billiard FL USA 30.888012
Then, we write:
df.groupby('state').iata.nunique()
to get the following results:
state
CO 1
MS 1
TX 1
FL 3
NY 1
Name: iata, dtype: int64
Hope this helps.
Upvotes: 2
Reputation: 375445
You can rewrite this as an explicit groupby apply:
In [11]: df.groupby("state")["airport"].apply(lambda x: x.value_counts() / len(x))
Out[11]:
state
CO Meadow Lake 1.0
FL Hilliard Airpark 1.0
MS Thigpen 1.0
NY Perry-Warsaw 1.0
TX Livingston Municipal 1.0
Name: airport, dtype: float64
or store the groupby and reuse it (probably this is faster):
In [21]: g = df.groupby("state")["airport"]
In [22]: g.value_counts() / g.size()
Out[22]:
state airport
CO Meadow Lake 1.0
FL Hilliard Airpark 1.0
MS Thigpen 1.0
NY Perry-Warsaw 1.0
TX Livingston Municipal 1.0
Name: airport, dtype: float64
Upvotes: 0
Reputation: 3689
Assuming each record is an airport throughout, you can just count the records for each state / country combination:
df.groupby(['country','state']).size()
Upvotes: 1