Reputation: 25
How can I update an array based on the nearest value in a pandas DataFrame column? For example, I'd like to update the following array based on the "Time" column in the pandas DataFrame so that the array now contains the "X" values:
Input array:
a = np.array([
[122.25, 225.00, 201.00],
[125.00, 151.50, 160.62],
[99.99, 142.25, 250.01],
])
Input DataFrame:
df = pd.DataFrame({
'Time': [100, 125, 150, 175, 200, 225],
'X': [26100, 26200, 26300, 26000, 25900, 25800],
})
Expected output array:
([
[26200, 25800, 25900],
[26200, 26300, 26300],
[26100, 26300, 25800],
])
Upvotes: 1
Views: 138
Reputation: 93191
Use merge_asof
:
# Convert Time to float since your input array is float.
# merge_asof requires both sides to have the same data types
df['Time'] = df['Time'].astype('float')
# merge_asof also requires both data frames to be sorted by the join key (Time)
# So we need to flatten the input array and make note of the original order
# before going into the merge
a_ = np.ravel(a)
o_ = np.arange(len(a_))
tmp = pd.DataFrame({
'Time': a_,
'Order': o_
})
# Merge the two data frames and extract X in the original order
result = (
pd.merge_asof(tmp.sort_values('Time'), df.sort_values('Time'), on='Time', direction='nearest')
.sort_values('Order')
['X'].to_numpy()
.reshape(a.shape)
)
Upvotes: 2