Reputation: 161
I've got a DataFrame of aircraft operations. The operations are only logged for IFR flight plans (Instrument Flight Rules) when the weather is bad and aircraft need to use instruments rather than rely solely on visual for approach/landing. The problem is if they fly in with bad weather (IFR) but fly out with good weather (VFR, Visual Flight Rules) the VFR operation is not logged and I have an unbalanced operation. Basically, it looks like the aircraft never left.
I need to balance these operations to where the arrivals = departures. I’m struggling to find a succinct way to do this and was hoping there was an easier method through Pandas.
Sample of data. This is a subset of the larger nationwide operation data and parsed down for operations occuring at the airport KHRI:
Aircraft_ID Departure_Airport Arrival_Airport Departure_Date Arrival_Date
AMF1965 KLGD KHRI 31-Dec-2011 31-Dec-2011
AMF1965 KHRI KPDX 31-Dec-2011 31-Dec-2011
N901JE KHRI KS21 01-Jan-2012 01-Jan-2012
N901JE KS21 KHRI 02-Jan-2012 02-Jan-2012
AMF1926 KPDX KHRI 03-Jan-2012 03-Jan-2012
AMF1964 KPDX KHRI 04-Jan-2012 04-Jan-2012
AMF1965 KHRI KPDX 04-Jan-2012 04-Jan-2012
AMF1964 KPDX KHRI 05-Jan-2012 05-Jan-2012
AMF1964 KHRI KLGD 05-Jan-2012 05-Jan-2012
# ADD TO BALANCE 2 DEPARTURES FROM KHRI AND ONLY 1 ARRIVAL FOR AMF1965
AMF1965 BAL KHRI 04-Jan-2012 04-Jan-2012
# ADD TO BALANCE 1 DEPARTURES FROM KHRI AND 2 ARRIVALS FOR AMF1964
AMF1964 KHRI BAL 04-Jan-2012 04-Jan-2012
Edit: The added data adds an arrival to match an unbalanced departure. For the first add, one arrival/departure pair was matched on the 31-Dec-2012, leaving a lone departure on 04-Jan-2012. I just assumed the matching arrival was on the same day. The 'BAL' is just a placeholder since we don't know where they came from if balancing an arrival and we don't know where they went when balancing a departure.
I basically need to have the number of arrivals = departures for the airport in this subset and add dummy operations where necessary if an aircraft has unbalanced arrivals and departures. I am more-or-less able to do that now through custom functions, but I can't figure out a good way to determine if the dummy data needs to be an arrival or departure and for what airport. I just blanket create a dummy entry for the difference with no real data.
Upvotes: 2
Views: 271
Reputation: 11105
This doesn't fully answer your question, but may steer you toward a solution.
The following code finds aircraft-airport combinations where the number of departures does not match the number of arrivals.
import pandas as pd
df = pd.read_clipboard() # I copied the example DF from your question
# For each aircraft, count the number of *departures* from each airport
deps = df.groupby('Aircraft_ID')['Departure_Airport'].value_counts()
# For each aircraft, count the number of *arrivals* at each airport
arrs = df.groupby('Aircraft_ID')['Arrival_Airport'].value_counts()
# Concatenate these Series on their aircraft-airport MultiIndexes.
# pd.concat defaults to outer join, which means that
# if an index value appears in only one Series, the value
# from the other Series is NaN.
counts = pd.concat([deps, arrs], axis=1)
counts.rename(columns={'Departure_Airport': 'dep',
'Arrival_Airport': 'arr'}, inplace=True)
counts
dep arr
AMF1926 KHRI NaN 1.0
KPDX 1.0 NaN
AMF1964 KHRI 1.0 2.0
KLGD NaN 1.0
KPDX 2.0 NaN
AMF1965 KHRI 2.0 1.0
KLGD 1.0 NaN
KPDX NaN 2.0
N901JE KHRI 1.0 1.0
KS21 1.0 1.0
filled = counts.fillna(0, inplace=True).astype(int)
# Get all rows where the number of departures does not match
# the number of arrivals
filled[filled['dep'] != filled['arr']]
dep arr
AMF1926 KHRI 0 1
KPDX 1 0
AMF1964 KHRI 1 2
KLGD 0 1
KPDX 2 0
AMF1965 KHRI 2 1
KLGD 1 0
KPDX 0 2
The fillna
approach is too aggressive: it includes rows with zero departures or zero arrivals. For example, AMF1926 only made one flight (KPDX to KHRI), but it shows up in the above output because the number of departures from KPDX (one) does not match the number of arrivals at KPDX (zero).
Fix: drop rows with missing values instead of filling NaN with zero.
dropped = counts.dropna().astype(int)
dropped
dep arr
AMF1964 KHRI 1 2
AMF1965 KHRI 2 1
N901JE KHRI 1 1
KS21 1 1
unbalanced = dropped[dropped['dep'] != dropped['arr']]
unbalanced
dep arr
AMF1964 KHRI 1 2
AMF1965 KHRI 2 1
These are the flight-airport combinations that require balancing.
It'll take a bit more work to generate placeholder rows from this, but hopefully that's a start.
Upvotes: 1