alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

how can i merge two dataframes with different indices?

I'm given this series of data points with indices; so the dataframe looks like this:

index        value
0            3.075
1           44.338
2           10.030

Then I wrote a function, get_extrema, that finds all the local extremas and returns a dataframe with 3 columns: index, value, type.

index = the index of the extrema in the original given data set

value = the value of the local extrema found

type = either 'max' or 'min'

Then, using the results from get_extrema, I wrote another function that finds the amplitudes of the extrema pairs (aka the difference in values between a local min and a local max)... I'd only like to return the top pairs, which have the largest amplitudes:

def get_amplitude(extremas, col, n):
    amps = abs(extremas[col].diff(periods=1))
    amps.sort_values(inplace=True, ascending=False)
    amps = amps.head(n)

    df = pd.DataFrame({'local minima': extremas.loc[amps.index, 'value'],
                       'local maxima': extremas.loc[amps.index - 1, 'value']})

    return df

Extremas is the dataframe output from get_extremas, col is the name of the column that contains the extrema values in the dataframe mentioned, and n is the number of top n pairs that has the largest difference between the local min/max pair.

it works alright but the answer output is a bit weird:

+-------------------------------------+
|       | local maxima | local minima |
+-------+--------------+--------------+
| 17398 | 433.75       | NaN          |
| 17399 | NaN          | -99.00       |
| 17551 | 438.00       | NaN          |
| 17552 | NaN          | -88.25       |
| 21262 | 437.75       | NaN          |
| 21263 | NaN          | -120.75      |
+-------+--------------+--------------+

My desired output:

+-------------------------------------+
|       | local maxima | local minima |
+-------+--------------+--------------+
| 0     | 433.75       | -99.00       |
| 1     | 438.00       | -88.25       |
| 2     | 437.75       | -120.75      |
+-------+--------------+--------------+

I understand my output format is due to the two dataframes i'm trying to merge have different indices. but how can i overcome this?

i cannot use df.merge(df2) because they don't share the same key.

I feel like there must be a very easy solution that I'm overlooking. Any help is appreciated!

Upvotes: 2

Views: 1538

Answers (1)

Clock Slave
Clock Slave

Reputation: 7957

Does this help your case? I am not sure if we are on the same page, but I think this is what you are trying to do-

import pandas as pd
import numpy as np
# create random data
amps = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['value'])
extremas = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['value'])

# this part 
minima = extremas.loc[amps.index, 'value'].reset_index(drop=True)
maxima = extremas.loc[amps.index - 1, 'value'].reset_index(drop=True)
df = pd.DataFrame({'local minima':minima ,'local maxima': maxima})

Upvotes: 2

Related Questions