Reputation: 3689
I have a number of series in a pandas dataframe representing rates observed yearly.
For an experiment, I want some of these series' rates to converge towards one of the other series' rate in the last observed year.
For example, say I have this data, and I decide column a
is a meaningful target for column b
to approach asymptotically over, say, a ten year period in small, even sized increments (or decreasing; doesn't really matter).
I could of course do this in a loop, but I was wondering if there was a more general numpy
or scipy
vectorized way of making one series approach another asymptotically off the shelf.
rate a b
year
2006 0.393620 0.260998
2007 0.408620 0.260527
2008 0.396732 0.257396
2009 0.418029 0.249123
2010 0.414246 0.253526
2011 0.415873 0.256586
2012 0.414616 0.253865
2013 0.408332 0.257504
2014 0.401821 0.259208
Upvotes: 6
Views: 1290
Reputation: 4548
All right so this is just the procedure you described in your comment in code form, assuming a
and b
are your two numpy arrays:
b += (a[-1]-b[-1])/len(b)*numpy.arange(1,len(b)+1)
(a[-1]-b[-1])/len(b)
is one "chunk" and one more of them is added in each "iteration" (year) via multiplication with a numpy.arange()
array. I tried a few plots and it doesn't look good unless you tweak it, but it's what you asked for.
Upvotes: 3
Reputation: 284820
Generally speaking, you'd apply an "easing function" over some range.
For example, consider the figure below:
Here, we have two original datasets. We'll subtract the two, multiply the difference by the easing function shown in the third row, and then add the result back to the first curve. This will result in a new series that is the original data to the left of the gray region, a blend of the two within the gray region, and data from the other curve to the right of the gray region.
As an example:
import numpy as np
import matplotlib.pyplot as plt
# Generate some interesting random data
np.random.seed(1)
series1 = np.random.normal(0, 1, 1000).cumsum() + 20
series2 = np.random.normal(0, 1, 1000).cumsum()
# Our x-coordinates
index = np.arange(series1.size)
# Boundaries of the gray "easing region"
i0, i1 = 300, 700
# In this case, I've chosen a sinusoidal easing function...
x = np.pi * (index - i0) / (i1 - i0)
easing = 0.5 * np.cos(x) + 0.5
# To the left of the gray region, easing should be 1 (all series2)
easing[index < i0] = 1
# To the right, it should be 0 (all series1)
easing[index >= i1] = 0
# Now let's calculate the new series that will slowly approach the first
# We'll operate on the difference and then add series1 back in
diff = series2 - series1
series3 = easing * diff + series1
Also, if you're curious about the plot above, here's how it's generated:
fig, axes = plt.subplots(nrows=4, sharex=True)
axes[0].plot(series1, color='lightblue', lw=2)
axes[0].plot(series2, color='salmon', lw=1.5)
axes[0].set(ylabel='Original Series')
axes[1].plot(diff, color='gray')
axes[1].set(ylabel='Difference')
axes[2].plot(easing, color='black', lw=2)
axes[2].margins(y=0.1)
axes[2].set(ylabel='Easing')
axes[3].plot(series1, color='lightblue', lw=2)
axes[3].plot(series3, color='salmon', ls='--', lw=2, dashes=(12,20))
axes[3].set(ylabel='Modified Series')
for ax in axes:
ax.locator_params(axis='y', nbins=4)
for ax in axes[-2:]:
ax.axvspan(i0, i1, color='0.8', alpha=0.5)
plt.show()
Upvotes: 5