Reputation: 21
I have a pandas dataframe where half of the rows (where buffer_id = 'off') have missing values (NaNs and -9999) in three of the columns. I'm trying to fill those missing values using the corresponding values from the other half (where buffer_id = 'on') based on a matching site ID.
Here is a reproducible example:
df = pd.DataFrame({'site_id': ['A', 'B', 'C', 'A', 'B', 'C'],
'buffer_id': ['on', 'on', 'on', 'off', 'off', 'off'],
'easting': [111, 222, 333, 'NaN', 'NaN', 'NaN'],
'northing': [444, 555, 666, 'NaN', 'NaN', 'NaN'],
'year': [1990, 1995, 2000, -9999, -9999, -9999],
'ndvi': [12, 22, 32, 42, 52, 62]})
So the missing values in 'easting', 'northing', and 'year' should be filled with the values from the rows with corresponding site ID.
How would you go about doing that?
Thank you.
Upvotes: 2
Views: 1285
Reputation: 127
Using pandas
, I would suggest building a table of unique attributes for each site, and left joining it back to your orginal dataframe, as follows:
# Import packages
import pandas as pd
import numpy as np
# Your data frame
df = pd.DataFrame({'site_id': ['A', 'B', 'C', 'A', 'B', 'C'],
'buffer_id': ['on', 'on', 'on', 'off', 'off', 'off'],
'easting': [111, 222, 333, 'NaN', 'NaN', 'NaN'],
'northing': [444, 555, 666, 'NaN', 'NaN', 'NaN'],
'year': [1990, 1995, 2000, -9999, -9999, -9999],
'ndvi': [12, 22, 32, 42, 52, 62]})
# Clean NAs in year column
df['year'] = df['year'].replace(-9999, np.nan)
# Make table of unique attributes for each site (easting, northing, and year)
site_attributes = df[['site_id', 'easting', 'northing', 'year']].dropna().drop_duplicates()
# Join unique attributes to your original table
new_df = df[['site_id', 'buffer_id', 'ndvi']].merge(site_attributes, 'left')
Upvotes: 1
Reputation: 1
This should do the trick, vantastic!
# create Python dictionary holding key value pairs
# key is the well_id, value is a List holding the dataframe index of the identical well IDs
g = df.groupby('well_id').groups
#iterate through the lists in the dictionary
for val in g.values():
# assign the value of the easting, northing, and year columns from the first index of the pair to the second index
df.loc[val[1], 'easting'] = df.loc[val[0], 'easting']
df.loc[val[1], 'northing'] = df.loc[val[0], 'northing']
df.loc[val[1], 'year'] = df.loc[val[0], 'year']
Upvotes: 0