khu
khu

Reputation: 161

Balancing operation data in DataFrame

I've got a DataFrame of aircraft operations. The operations are only logged for IFR flight plans (Instrument Flight Rules) when the weather is bad and aircraft need to use instruments rather than rely solely on visual for approach/landing. The problem is if they fly in with bad weather (IFR) but fly out with good weather (VFR, Visual Flight Rules) the VFR operation is not logged and I have an unbalanced operation. Basically, it looks like the aircraft never left.

I need to balance these operations to where the arrivals = departures. I’m struggling to find a succinct way to do this and was hoping there was an easier method through Pandas.

Sample of data. This is a subset of the larger nationwide operation data and parsed down for operations occuring at the airport KHRI:

Aircraft_ID    Departure_Airport    Arrival_Airport    Departure_Date    Arrival_Date
AMF1965        KLGD                 KHRI               31-Dec-2011       31-Dec-2011
AMF1965        KHRI                 KPDX               31-Dec-2011       31-Dec-2011
N901JE         KHRI                 KS21               01-Jan-2012       01-Jan-2012
N901JE         KS21                 KHRI               02-Jan-2012       02-Jan-2012
AMF1926        KPDX                 KHRI               03-Jan-2012       03-Jan-2012
AMF1964        KPDX                 KHRI               04-Jan-2012       04-Jan-2012
AMF1965        KHRI                 KPDX               04-Jan-2012       04-Jan-2012
AMF1964        KPDX                 KHRI               05-Jan-2012       05-Jan-2012
AMF1964        KHRI                 KLGD               05-Jan-2012       05-Jan-2012



# ADD TO BALANCE 2 DEPARTURES FROM KHRI AND ONLY 1 ARRIVAL FOR AMF1965
AMF1965        BAL                  KHRI               04-Jan-2012       04-Jan-2012  
# ADD TO BALANCE 1 DEPARTURES FROM KHRI AND 2 ARRIVALS FOR AMF1964
AMF1964        KHRI                 BAL                04-Jan-2012       04-Jan-2012

Edit: The added data adds an arrival to match an unbalanced departure. For the first add, one arrival/departure pair was matched on the 31-Dec-2012, leaving a lone departure on 04-Jan-2012. I just assumed the matching arrival was on the same day. The 'BAL' is just a placeholder since we don't know where they came from if balancing an arrival and we don't know where they went when balancing a departure.

I basically need to have the number of arrivals = departures for the airport in this subset and add dummy operations where necessary if an aircraft has unbalanced arrivals and departures. I am more-or-less able to do that now through custom functions, but I can't figure out a good way to determine if the dummy data needs to be an arrival or departure and for what airport. I just blanket create a dummy entry for the difference with no real data.

Upvotes: 2

Views: 271

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

This doesn't fully answer your question, but may steer you toward a solution.

The following code finds aircraft-airport combinations where the number of departures does not match the number of arrivals.

import pandas as pd
df = pd.read_clipboard()   # I copied the example DF from your question

# For each aircraft, count the number of *departures* from each airport
deps = df.groupby('Aircraft_ID')['Departure_Airport'].value_counts()
# For each aircraft, count the number of *arrivals* at each airport
arrs = df.groupby('Aircraft_ID')['Arrival_Airport'].value_counts()

# Concatenate these Series on their aircraft-airport MultiIndexes.
# pd.concat defaults to outer join, which means that
# if an index value appears in only one Series, the value
# from the other Series is NaN.
counts = pd.concat([deps, arrs], axis=1)
counts.rename(columns={'Departure_Airport': 'dep',
                       'Arrival_Airport': 'arr'}, inplace=True)
counts
              dep  arr
AMF1926 KHRI  NaN  1.0
        KPDX  1.0  NaN
AMF1964 KHRI  1.0  2.0
        KLGD  NaN  1.0
        KPDX  2.0  NaN
AMF1965 KHRI  2.0  1.0
        KLGD  1.0  NaN
        KPDX  NaN  2.0
N901JE  KHRI  1.0  1.0
        KS21  1.0  1.0
filled = counts.fillna(0, inplace=True).astype(int)

# Get all rows where the number of departures does not match
# the number of arrivals
filled[filled['dep'] != filled['arr']]
              dep  arr
AMF1926 KHRI    0    1
        KPDX    1    0
AMF1964 KHRI    1    2
        KLGD    0    1
        KPDX    2    0
AMF1965 KHRI    2    1
        KLGD    1    0
        KPDX    0    2

The fillna approach is too aggressive: it includes rows with zero departures or zero arrivals. For example, AMF1926 only made one flight (KPDX to KHRI), but it shows up in the above output because the number of departures from KPDX (one) does not match the number of arrivals at KPDX (zero).

Fix: drop rows with missing values instead of filling NaN with zero.

dropped = counts.dropna().astype(int)
dropped
              dep  arr
AMF1964 KHRI    1    2
AMF1965 KHRI    2    1
N901JE  KHRI    1    1
        KS21    1    1

unbalanced = dropped[dropped['dep'] != dropped['arr']]
unbalanced
              dep  arr
AMF1964 KHRI    1    2
AMF1965 KHRI    2    1

These are the flight-airport combinations that require balancing.

It'll take a bit more work to generate placeholder rows from this, but hopefully that's a start.

Upvotes: 1

Related Questions