Reputation: 1657
I'm given this series of data points with indices; so the dataframe looks like this:
index value
0 3.075
1 44.338
2 10.030
Then I wrote a function, get_extrema
, that finds all the local extremas and returns a dataframe with 3 columns: index
, value
, type
.
index = the index of the extrema in the original given data set
value = the value of the local extrema found
type = either 'max' or 'min'
Then, using the results from get_extrema
, I wrote another function that finds the amplitudes of the extrema pairs (aka the difference in values between a local min and a local max)... I'd only like to return the top pairs, which have the largest amplitudes:
def get_amplitude(extremas, col, n):
amps = abs(extremas[col].diff(periods=1))
amps.sort_values(inplace=True, ascending=False)
amps = amps.head(n)
df = pd.DataFrame({'local minima': extremas.loc[amps.index, 'value'],
'local maxima': extremas.loc[amps.index - 1, 'value']})
return df
Extremas is the dataframe output from get_extremas
, col is the name of the column that contains the extrema values in the dataframe mentioned, and n is the number of top n pairs that has the largest difference between the local min/max pair.
it works alright but the answer output is a bit weird:
+-------------------------------------+
| | local maxima | local minima |
+-------+--------------+--------------+
| 17398 | 433.75 | NaN |
| 17399 | NaN | -99.00 |
| 17551 | 438.00 | NaN |
| 17552 | NaN | -88.25 |
| 21262 | 437.75 | NaN |
| 21263 | NaN | -120.75 |
+-------+--------------+--------------+
My desired output:
+-------------------------------------+
| | local maxima | local minima |
+-------+--------------+--------------+
| 0 | 433.75 | -99.00 |
| 1 | 438.00 | -88.25 |
| 2 | 437.75 | -120.75 |
+-------+--------------+--------------+
I understand my output format is due to the two dataframes i'm trying to merge have different indices. but how can i overcome this?
i cannot use df.merge(df2)
because they don't share the same key.
I feel like there must be a very easy solution that I'm overlooking. Any help is appreciated!
Upvotes: 2
Views: 1538
Reputation: 7957
Does this help your case? I am not sure if we are on the same page, but I think this is what you are trying to do-
import pandas as pd
import numpy as np
# create random data
amps = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['value'])
extremas = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['value'])
# this part
minima = extremas.loc[amps.index, 'value'].reset_index(drop=True)
maxima = extremas.loc[amps.index - 1, 'value'].reset_index(drop=True)
df = pd.DataFrame({'local minima':minima ,'local maxima': maxima})
Upvotes: 2