Reputation: 1336
How to use pandas to fill missing data on specific categories of data within the dataframe?
I have used fillna(method='ffill') after selecting a category but this creates NaN values on the other categories. Is there a better way?
I use this, i.e. I select the category 'Buurt' and apply the ffill (to insert missing zipcodes). But the rows with another category ('Wijk') will become NaN.
df['Zipcode'] = df.loc[(df['RegionType'] == 'Buurt'), 'Zipcode'].fillna(method='ffill')
df before and after:
City Zipcode RegionType
Codering
BU06100305 Sliedrecht 3362 Buurt
BU06100306 Sliedrecht 3361 Buurt
BU06100307 Sliedrecht 3361 Buurt
WK061004 Sliedrecht . Wijk
BU06100401 Sliedrecht NaN Buurt
BU06100402 Sliedrecht NaN Buurt
BU06100403 Sliedrecht 3364 Buurt
BU06100404 Sliedrecht 3364 Buurt
BU06100405 Sliedrecht 3364 Buurt
BU06100406 Sliedrecht 3364 Buurt
City Zipcode RegionType
Codering
BU06100305 Sliedrecht 3362 Buurt
BU06100306 Sliedrecht 3361 Buurt
BU06100307 Sliedrecht 3361 Buurt
WK061004 Sliedrecht NaN Wijk
BU06100401 Sliedrecht 3361 Buurt
BU06100402 Sliedrecht 3361 Buurt
BU06100403 Sliedrecht 3364 Buurt
BU06100404 Sliedrecht 3364 Buurt
BU06100405 Sliedrecht 3364 Buurt
BU06100406 Sliedrecht 3364 Buurt
I could change the NaNs back to a dot (.) but feel there must be a better way. Especially because the rows for 'Buurt' were actually filled with a dot (.) and I changed these to NaN to do the fillna(method='ffill') in the first place...
serieBuurtNoZipcode = (df['RegioType'] == 'Buurt') & (df['Zipcode'] == '.')
df.loc[(serieBuurtNoZipcode), 'Zipcode'] = np.nan
How can I do this in a smarter way? Many thanks in advance!
Upvotes: 2
Views: 892
Reputation: 863731
I think you need assign to filtered column Zipcode
too:
mask = (df['RegionType'] == 'Buurt')
df.loc[mask, 'Zipcode'] = df.loc[mask, 'Zipcode'].ffill()
print (df)
City Zipcode RegionType
Codering
BU06100305 Sliedrecht 3362 Buurt
BU06100306 Sliedrecht 3361 Buurt
BU06100307 Sliedrecht 3361 Buurt
WK061004 Sliedrecht . Wijk
BU06100401 Sliedrecht 3361 Buurt
BU06100402 Sliedrecht 3361 Buurt
BU06100403 Sliedrecht 3364 Buurt
BU06100404 Sliedrecht 3364 Buurt
BU06100405 Sliedrecht 3364 Buurt
BU06100406 Sliedrecht 3364 Buurt
But if want replace all categories by ffill
use groupby
:
df['Zipcode'] = df.groupby('RegionType')['Zipcode'].ffill()
More general:
df['Zipcode'] = df.groupby('RegionType')['Zipcode'].apply(lambda x: x.ffill().bfill())
Upvotes: 2