Numpy: divide the current row by the previous row

Question

For my experiment, I have three different time-series data of the following format with different characteristics where the first column is timestamp and the second column is the value.

0.086206438,10
0.086425551,12
0.089227066,20
0.089262508,24
0.089744425,30
0.090036815,40
0.090054172,28
0.090377569,28
0.090514071,28
0.090762872,28
0.090912691,27

For reproducibility, I have shared the three time-series data I am using here.

From column 2, I wanted to read the current row and compare it with the value of the previous row. If it is greater, I keep comparing. If the current value is smaller than the previous row's value, I want to divide the current value (smaller) by the previous value (larger). Let me make it clear. For example in the above sample record I provided, the seventh row (28) is smaller than the value in the sixth row (40) - so it will be (28/40=0.7).

Here is my sample code.

import numpy as np
import pandas as pd
import csv
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from statsmodels.graphics.tsaplots import plot_acf, acf


protocols = {}


types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time = []  
    col_window = [] 
    with open(fname, mode='r', encoding='utf-8-sig') as f:
        reader = csv.reader(f, delimiter=",")
        for i in reader:
            col_time.append(float(i[0]))
            col_window.append(int(i[1]))
    col_time, col_window = np.array(col_time), np.array(col_window)
    diff_time = np.diff(col_time)
    diff_window = np.diff(col_window)
    diff_time = diff_time[diff_window > 0] 
    diff_window = diff_window[diff_window > 0] # To keep only the increased values
    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "diff_time": diff_time,
        "diff_window": diff_window,
    }


# Plot the quotient values
rt = np.exp(np.diff(np.log(col_window)))

for protname, fname in types.items():
    col_time, col_window = protocols[protname]["col_time"], protocols[protname]["col_window"]
    rt = np.exp(np.diff(np.log(col_window)))
    plt.plot(np.diff(col_time), rt, ".", markersize=4, label=protname, alpha=0.1)
    plt.ylim(0, 1.0001)
    plt.xlim(0, 0.003)
    plt.title(protname)
    plt.xlabel("time")
    plt.ylabel("difference")
    plt.legend()
    plt.show()

This gives me the following plots

However, when I do this

rt = np.exp(np.diff(np.log(col_window)))

It is dividing every current row by the previous row which is not something I want. As I explained above with an example in my question, I want to divide the current row value of column 2 by the previous value of column 2 ONLY if the current row value is smaller than the previous value. Finally, plot the quotient against the timestamp difference (col_time in my code above). How can I fix this?

jyalim · Accepted Answer

Unless you specifically need the csv module, I would recommend using the numpy method loadtxt to load your files, that is

col_time,col_window = np.loadtxt(fname,delimiter=',').T

This single line takes care of the first 8 lines of your for loop. Note the transpose operation (.T) is necessary to convert the original data shape (N rows by 2 columns) into a 2 row by N column shape that is unpacked into col_time and col_window. Also note that loadtxt automatically loads the data into numpy.array objects.

As for your actual question, I would use slicing and masking:

trailing_window = col_window[:-1] # "past" values at a given index
leading_window  = col_window[1:]  # "current values at a given index
decreasing_mask = leading_window < trailing_window
quotient = leading_window[decreasing_mask] / trailing_window[decreasing_mask]
quotient_times = col_time[decreasing_mask]

Then quotient_times may be plotted against quotient.

An alternative would be to use the numpy method where to grab the indices where the mask is True:

trailing_window = col_window[:-1] # "past" values at a given index
leading_window  = col_window[1:]  # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds] / trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]

Keep in mind that all the above code still takes place in the first for loop, but now the rt is computed inside the loop as quotient. Thus after computing quotient_times, to plot (also inside the first loop):

# Next line opens a new figure window and then clears it
figure(); clf()
# Updated plotting call with the syntax from the answer
plt.plot(quotient_times,quotient,'.',ms=4,label=protname,alpha=0.1)
plt.ylim(0, 1.0001)
plt.xlim(0, 0.003)
plt.title(protname)
plt.xlabel("time")
plt.ylabel("quotient")
plt.legend()
# You may not need this `plt.show()` line 
plt.show()
# To save the figure, one option would be the following:
# plt.savefig(protname+'.png')

Note that you may need to take the plt.show() line out of the loop.

Putting it together for you,

import numpy as np
import matplotlib.pyplot as plt

protocols = {}

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds] / 
    trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]
    # Still save the values in case computation needs to happen later 
    # in the script    
    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }
    # Next line opens a new figure window and then clears it
    plt.figure(); plt.clf()
    plt.plot(quotient_times,quotient, ".", markersize=4, label=protname, alpha=0.1)
    plt.ylim(0, 1.0001)
    plt.xlim(0, 0.003)
    plt.title(protname)
    plt.xlabel("time")
    plt.ylabel("quotient")
    plt.legend()
    # To save the figure, one option would be the following:
    # plt.savefig(protname+'.png')
# This may still be unnecessary, especially if called as a script
# (just save the plots to `png`).
plt.show()

Numpy: divide the current row by the previous row

Answers (1)

Related Questions