Finding a valley in a noisy data

Date        Time_GMTTime_IST    Current
11/15/2016  5:12:27 10:42:27    26.61
11/15/2016  5:12:28 10:42:28    42.27
11/15/2016  5:12:29 10:42:29    25.48
11/15/2016  5:12:30 10:42:30    24.24
11/15/2016  5:12:31 10:42:31    25.91
11/15/2016  5:12:32 10:42:32    27.75
11/15/2016  5:12:33 10:42:33    24.46
11/15/2016  5:12:34 10:42:34    24.32
11/15/2016  5:12:35 10:42:35    24.81
11/15/2016  5:12:36 10:42:36    27.36
11/15/2016  5:12:37 10:42:37    28.2
11/15/2016  5:12:38 10:42:38    28.29
11/15/2016  5:12:39 10:42:39    26.52
11/15/2016  5:12:40 10:42:40    32.58
11/15/2016  5:12:41 10:42:41    24.24
11/15/2016  5:12:42 10:42:42    24.36
11/15/2016  5:12:43 10:42:43    26.48
11/15/2016  5:12:44 10:42:44    28.76
11/15/2016  5:12:45 10:42:45    24.51
11/15/2016  5:12:46 10:42:46    23.93
11/15/2016  5:12:47 10:42:47    25.23
11/15/2016  5:12:48 10:42:48    27.9
11/15/2016  5:12:49 10:42:49    27.84
11/15/2016  5:12:50 10:42:50    27.31
11/15/2016  5:12:51 10:42:51    29.17
11/15/2016  5:12:52 10:42:52    24
11/15/2016  5:12:53 10:42:53    32.51
11/15/2016  5:12:54 10:42:54    26.63
11/15/2016  5:12:55 10:42:55    22.34
11/15/2016  5:12:56 10:42:56    29.14
11/15/2016  5:12:57 10:42:57    46.62
11/15/2016  5:12:58 10:42:58    48.85
11/15/2016  5:12:59 10:42:59    30.59
11/15/2016  5:13:00 10:43:00    30.68
11/15/2016  5:13:01 10:43:01    30.82
11/15/2016  5:13:02 10:43:02    31.64
11/15/2016  5:13:03 10:43:03    43.91

The above is a sample data, the data goes on for days.I have to find the depression in current as shown in the image. If the current goes below 30 amps for a long time I have to detect that valley-like depression. I have been working on it for a while and I'm not able to think of any logic that can find the solution precicely. Any kind of suggestion is appreciated. A machine learning approach is also accepted.

Upvotes: 0

Answers (2)

Sandipan Dey

Reputation: 23101

We can try to find valleys using similar idea, but using numpy convolution:

Pick a window and compute smoothed data e.g., with MA (moving average) using convolution.
Compute the residual from the original data and the smoothed data.

Valley points are the consecutive points where residual values are small.

import numpy as np
Import pandas as pd # read data in data frame df
w_sz = 3 # window size
ma = np.convolve(df.Current, np.ones(w_sz)/w_sz, mode='same')
resid = df.Current - ma
threshold = 1 #0.1
prob_val = np.where(abs(resid)<=threshold)
val_indices = np.where(np.diff(prob_val) != 1)[1]+1 
import matplotlib.pyplot as plt
plt.plot(df.Current)
plt.plot(ma)
plt.plot(resid)
plt.axhline(0)
plt.plot(val_indices, np.zeros(len(val_indices)), 'o', color='red')
plt.legend(['Current', 'MA-smoothed', 'Residual'], loc='upper center');
plt.show()

There are 3 valleys shown in the figure, between each 2 consecutive red points. It seems there is only one red point for the first valley, but actually there are two consecutive points and the length of the valley is one. We can filter out small length valleys too.

Upvotes: 0

jbndlr

Reputation: 5210

You could just use a moving window average approach:

Select an appropriate window width (in your case, the delta between entries is one second each, so your chosen width will be in dimensions of seconds)
Iterate over your currents column and calculate the average of currents with respect to your chosen window width
Check when it drops below a threshold or raises above it, depending on its prior state

With your example data, this may look like the following. In this plot, your original currents data is depicted as a blue dotted line, the moving average is the thick green line and state changes are marked as red vertical lines.

The code I used to generate that image is:

import matplotlib
import matplotlib.pyplot as plt

c = [26.61, 42.27, 25.48, 24.24, 25.91, 27.75, 24.46, 24.32, 24.81, 27.36, 28.2, 28.29, 26.52, 32.58, 24.24, 24.36, 26.48, 28.76, 24.51, 23.93, 25.23, 27.9, 27.84, 27.31, 29.17, 24, 32.51, 26.63, 22.34, 29.14, 46.62, 48.85, 30.59, 30.68, 30.82, 31.64, 43.91]

if __name__ == '__main__':
    # Choose window width and threshold
    window = 5
    thres = 27.0

    # Iterate and collect state changes with regard to previous state
    changes = []
    rolling = [None] * window
    old_state = None
    for i in range(window, len(c) - 1):
        slc = c[i - window:i + 1]
        mean = sum(slc) / float(len(slc))
        state = 'good' if mean > thres else 'bad'

        rolling.append(mean)
        if not old_state or old_state != state:
            print('Changed to {:>4s} at position {:>3d} ({:5.3f})'.format(state, i, mean))
            changes.append((i, state))
            old_state = state

    # Plot results and state changes
    plt.figure(frameon=False, figsize=(10, 8))
    currents, = plt.plot(c, ls='--', label='Current')
    rollwndw, = plt.plot(rolling, lw=2, label='Rolling Mean')
    plt.axhline(thres, xmin=.0, xmax=1.0, c='grey', ls='-')
    plt.text(40, thres, 'Threshold: {:.1f}'.format(thres), horizontalalignment='right')
    for c, s in changes:
        plt.axvline(c, ymin=.0, ymax=.7, c='red', ls='-')
        plt.text(c, 41.5, s, color='red', rotation=90, verticalalignment='bottom')
    plt.legend(handles=[currents, rollwndw], fontsize=11)
    plt.grid(True)
    plt.savefig('local/plot.png', dpi=72, bbox_inches='tight')

Upvotes: 4

Finding a valley in a noisy data

Answers (2)

Related Questions