Reputation: 909
I have a dataframe like the following:
loc status ID
0 LA NaN NaN
1 CHC NaN NaN
2 NYC ARR 32
3 CHC DEP 45
4 SEA NaN NaN
I am trying to fill the missing values in the ID column depending on the status column. If the status column is "ARR": I want to fill backwards and if the status column is "DEP": I want to fill forwards so my final dataframe would look like:
loc status ID
0 LA NaN 32
1 CHC NaN 32
2 NYC ARR 32
3 CHC DEP 45
4 SEA NaN 45
I have been trying to accomplish this by using 2 for loops to loop through both columns, but I was wondering if there was a more efficient way to do this in Pandas?
Upvotes: 1
Views: 2332
Reputation: 5213
You can approach this by dividing your dataframe df
according to whether you want to forward fill or backward fill those rows:
create two copies of your df, one with everything forward filled and the other with everything back filled
fill_forward = df.status.fillna(method='ffill')
fill_backward = df.status.fillna(method='bfill')
get the indices of the rows where forward filling resulted in rows being filled with 'DEP'
and the indices where back filling resulted in the rows being filled with 'ARR'
(ie. your two conditions)
forward_index = df.index[(df.status != fill_forward) & (fill_forward == 'DEP')]
backward_index = df.index[(df.status != fill_backward) & (fill_backward == 'ARR')]
update these indices so that they include the row directly preceding (used when forward filling) or the row directly following (used when backward filling).
forward_rows = sorted(list({ind for f in forward_index for ind in [f,f-1]}))
backward_rows = sorted(list({ind for b in backward_index for ind in [b,b+1]}))
fill (using the appropriate method) for each the list of indices and assign the updated values to the original df. note that by doing the forward fill first you are giving preference to forward filling when the indices overlap.
df.ID.iloc[forward_rows] = df.ID.iloc[forward_rows].fillna(method='ffill')
df.ID.iloc[backward_rows] = df.ID.iloc[backward_rows].fillna(method='bfill')
print(df)
loc status ID
0 LA NaN 32.0
1 CHC NaN 32.0
2 NYC ARR 32.0
3 CHC DEP 45.0
4 SEA NaN 45.0
Upvotes: 0
Reputation: 7022
This should work
dt.ID.fillna(method='bfill').fillna(method='ffill')
It will fill NA values with preceding non-NA values (in reverse first and then forwards)
Edit:
Maybe this is what you're looking for (based on comments)
dt.ID.fillna(method='ffill').where(dt.ID.notnull() | (dt.status.shift(1) == 'DEP'), dt.ID.fillna(method='bfill').where(dt.ID.notnull() | (dt.status.shift(-1) == 'ARR')))
Its not very readable, but should give a general idea
Upvotes: 2