Keven Scharaswak
Keven Scharaswak

Reputation: 107

Find int values in a numpy array that are "close in value" and combine them

I have a numpy array with these values: [10620.5, 11899., 11879.5, 13017., 11610.5]

import Numpy as np
array = np.array([10620.5, 11899,  11879.5, 13017,  11610.5])

I would like to get values that are "close" (in this instance, 11899 and 11879) and average them, then replace them with a single instance of the new number resulting in this:

[10620.5, 11889, 13017, 11610.5]

the term "close" would be configurable. let's say a difference of 50

the purpose of this is to create Spans on a Bokah graph, and some lines are just too close

I am super new to python in general (a couple weeks of intense dev)

I would think that I could arrange the values in order, and somehow grab the one to the left, and right, and do some math on them, replacing a match with the average value. but at the moment, I just dont have any idea yet.

Upvotes: 0

Views: 820

Answers (3)

Keven Scharaswak
Keven Scharaswak

Reputation: 107

Taking Gustavo's answer and tweaking it to my needs:

def reshape_arr(a, close):
    flag = True
    while flag is not False:
        array = a.sort_values().unique()
        l = len(array)
        flag = False
        for i in range(l):
            previous_item = next_item = None
            if i > 0:
                previous_item = array[i - 1]
            if i < (l - 1):
                next_item = array[i + 1]
            if previous_item is not None:
                if abs(array[i] - previous_item) < close:
                    average = (array[i] + previous_item) / 2
                    flag = True
                    #find matching values in a, and replace with the average
                    a.replace(previous_item, value=average, inplace=True)
                    a.replace(array[i], value=average, inplace=True)

            if next_item is not None:
                if abs(next_item - array[i]) < close:
                    flag = True
                    average = (array[i] + next_item) / 2
                    # find matching values in a, and replace with the average
                    a.replace(array[i], value=average, inplace=True)
                    a.replace(next_item, value=average, inplace=True)
    return a

this will do it if I do something like this:

 candlesticks['support'] = reshape_arr(supres_df['support'], 150)

where candlesticks is the main DataFrame that I am using and supres_df is another DataFrame that I am massaging before I apply it to the main one.

it works, but is extremely slow. I am trying to optimize it now.

I added a while loop because after averaging, the averages can become close enough to average out again, so I will loop again, until it doesn't need to average anymore. This is total newbie work, so if you see something silly, please comment.

Upvotes: 0

Gustavo Gradvohl
Gustavo Gradvohl

Reputation: 712

Try something like this, I added a few extra steps, just to show the flow: the idea is to group the data into adjacent groups, and decide if you want to group them or not based on how spread they are.

So as you describe you can combine you data in sets of 3 nummbers and if the difference between the max and min numbers are less than 50 you average them, otherwise you leave them as is.

enter image description here

import pandas as pd
import numpy as np
arr = np.ravel([1,24,5.3, 12, 8, 45, 14, 18, 33, 15, 19, 22])
arr.sort()

def reshape_arr(a, n): # n is number of consecutive adjacent items you want to compare for averaging
    hold = len(a)%n
    if hold != 0:
        container = a[-hold:] #numbers that do not fit on the array will be excluded for averaging
        a = a[:-hold].reshape(-1,n)
    else:
        a = a.reshape(-1,n)
        container = None
    return a, container
def get_mean(a, close): # close = how close adjacent numbers need to be, in order to be averaged together
    my_list=[]
    for i in range(len(a)):
        if a[i].max()-a[i].min() > close:
            for j in range(len(a[i])):
                my_list.append(a[i][j])
        else:
            my_list.append(a[i].mean())
    return my_list  
def final_list(a, c): # add any elemts held in the container to the final list
    if c is not None:
        c = c.tolist()
        for i in range(len(c)):
            a.append(c[i])
    return a 

arr, container = reshape_arr(arr,3)
arr = get_mean(arr, 5)
final_list(arr, container)

Upvotes: 1

alanindublin
alanindublin

Reputation: 101

You could use fuzzywuzzy here to gauge the ratio of cloesness between 2 data sets.

See details here: http://jonathansoma.com/lede/algorithms-2017/classes/fuzziness-matplotlib/fuzzing-matching-in-pandas-with-fuzzywuzzy/

Upvotes: 0

Related Questions