Babeeshka
Babeeshka

Reputation: 105

Pandas groupby using function variable

I have this dataframe:

  iata               airport              city state country        lat  \
0  00M              Thigpen        Bay Springs    MS     USA  31.953765   
1  00R  Livingston Municipal        Livingston    TX     USA  30.685861   
2  00V           Meadow Lake  Colorado Springs    CO     USA  38.945749   
3  01G          Perry-Warsaw             Perry    NY     USA  42.741347   
4  01J      Hilliard Airpark          Hilliard    FL     USA  30.688012 

I am trying to get the number of airports per state. For example if I have the function:

f(dataframe, state):
    result reuslt

Where state would be a state abbreviation, such as 'MA'. I am trying to group the dataframe by the input variable, such as state ('MA') to then get the number of airports per state.

When I use:

df.groupby(state)['airport'].value_counts()

or

df.groupby(state)['airport'].value_counts()/df['airport'].count()


df.groupby(['state'] == state)['airport'].value_counts()/df['airport'].count()

The last two are regarding the conditional probability a selected airport will be in that state.

It throws a Key Error: 'MA', which I think is due to the input variable not being recognized as a column, but a value in the column.

Is there a way to get the number of airports per state?

Upvotes: 2

Views: 1320

Answers (4)

Babeeshka
Babeeshka

Reputation: 105

This seemed to work the way I intended with all your help. a[state] represents an input in the form of a state abbreviation ('MA'). This returns the probability of a randomly selected airport belonging to that state.

a = df.groupby('state').iata.nunique()
s = a.sum()
result = a[state]/s
return result

Upvotes: 0

KuboAndTwoStrings
KuboAndTwoStrings

Reputation: 904

I would use Pandas's nunique to get the number of airports per state. The code is easier to read and remember.

To illustrate my point, I modified the dataset as follows, such that Florida has three more fictional airports:

iata airport city state country lat 0 00M Thigpen Bay Springs MS USA 31.953765 1 00R Livingston Municipal Livingston TX USA 30.685861 2 00V Meadow Lake Springs CO USA 38.945749 3 01G Perry-Warsaw Perry NY USA 42.741347 4 01J Hilliard Airpark Hilliard FL USA 30.688012 5 f234 Weirdviller Chilliard FL USA 30.788012 6 23r2 Johnson Billiard FL USA 30.888012

Then, we write:

df.groupby('state').iata.nunique() to get the following results:

state CO 1 MS 1 TX 1 FL 3 NY 1 Name: iata, dtype: int64

Hope this helps.

Upvotes: 2

Andy Hayden
Andy Hayden

Reputation: 375445

You can rewrite this as an explicit groupby apply:

In [11]: df.groupby("state")["airport"].apply(lambda x: x.value_counts() / len(x))
Out[11]:
state
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

or store the groupby and reuse it (probably this is faster):

In [21]: g = df.groupby("state")["airport"]

In [22]: g.value_counts() / g.size()
Out[22]:
state  airport
CO     Meadow Lake             1.0
FL     Hilliard Airpark        1.0
MS     Thigpen                 1.0
NY     Perry-Warsaw            1.0
TX     Livingston Municipal    1.0
Name: airport, dtype: float64

Upvotes: 0

ako
ako

Reputation: 3689

Assuming each record is an airport throughout, you can just count the records for each state / country combination:

df.groupby(['country','state']).size()

Upvotes: 1

Related Questions