Reputation: 3562
Consider the following example problem:
# dummy data for a SO question
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
You've got N
time series, and they've got one component that follows a common process, and another component that is idiosyncratic to the series itself. Assume for simplicity that you know a priori that the bump is between 40 and 60, and you want to model it simultaneously with the sinusoidal component.
A TCN does a good job on the common component, but it can't get the series-idiosyncratic component:
# time series model
n_filters = 10
filter_width = 3
dilation_rates = [2**i for i in range(7)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
X_train = np.transpose(np.c_[time, time]).reshape(2,100,1)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(X_train, y_train,
batch_size=2,
epochs=1000,
verbose = 0)
yhat = model.predict(X_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
On the other hand, a basic linear regression with N outputs (here implemented in Keras) is perfect for the idiosyncratic component:
inp1 = Input((1,))
x1 = inp1
x1 = Dense(2)(x1)
model1 = Model(inputs = inp1, outputs = x1)
model1.compile(optimizer = Adam(), loss='mean_squared_error')
model1.summary()
brk_train = brk
y_train = y
history = model1.fit(brk_train, y_train,
batch_size=100,
epochs=6000, verbose = 0)
yhat1 = model1.predict(brk_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat1[:,0])
plt.plot(time, yhat1[:,1])
I want to use keras to jointly estimate the time series component and the idiosyncratic component. The major problem is that feed-forward networks (which linear regression is a special case of) take shape batch_size x dims
while time series networks take dimension batch_size x time_steps x dims
.
Because I want to jointly estimate the idiosyncratic part of the model (the linear regression part) together with the time series part, I'm only ever going to batch-sample whole time-series. Which is why I specified batch_size = time_steps
for model 1.
But in the static model, what I'm really doing is modeling my data as time_steps x dims
.
I have tried to re-cast the feed-forward model as a time-series model, without success. Here's the non-working approach:
inp3 = Input(shape = (None, 1))
x3 = inp3
x3 = Dense(2)(x3)
model3 = Model(inputs = inp3, outputs = x3)
model3.compile(optimizer = Adam(), loss='mean_squared_error')
model3.summary()
brk_train = brk.reshape(1, 100, 1)
y_train = np.transpose(y).reshape(2,100,1)
history = model3.fit(brk_train, y_train,
batch_size=1,
epochs=1000, verbose = 1)
ValueError: Error when checking target: expected dense_40 to have shape (None, 2) but got array with shape (100, 1)
I am trying to fit the same model as model1, but with a different shape, so that it is compatible with the TCN model -- and importantly so that it will have the same batching structure.
The output should ultimately have the shape (2, 100, 1)
in this example. Basically I want the model to do the following algorithm:
X
of shape (N, time_steps, dims)
X1
of shape (time_steps, dims)
np.dot(X1, W)
, where W
is of dimension (dims, N)
, yielding X2
of dimension (time_steps, N)
X2
to (N, time_steps, 1)
. Then I can add it to the output of the other part of the model.W
with respect to the output is just X1
How can I implement this? Do I need a custom layer?
I'm building off of ideas in this paper, in case you're curious about the motivation behind all of this.
EDIT: After posting, I noticed that I used only the time variable, rather than the time series itself. A TCN fit with the lagged series fits the idiosyncratic part of the series just fine (in-sample anyway). But my basic question still stands -- I want to merge the two types of networks.
Upvotes: 2
Views: 241
Reputation: 3562
So, I solved my own problem. The answer is to create dummy interactions (and a thus a really sparse design matrix) and then reshape the data.
###########################
# interaction model
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
from patsy import dmatrix
def shift5(arr, num, fill_value=np.nan):
result = np.empty_like(arr)
if num > 0:
result[:num] = fill_value
result[num:] = arr[:-num]
elif num < 0:
result[num:] = fill_value
result[:num] = arr[-num:]
else:
result = arr
return result
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
# define interaction model
inp = Input(shape=(None, 2))
x = inp
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
from patsy import dmatrix
df = pd.DataFrame(data = {"fips": np.concatenate((np.zeros(100), np.ones(100))),
"brk": np.concatenate((brk.reshape(100), brk.squeeze()))})
df.brk = df.brk.astype(int)
tm = np.asarray(dmatrix("brk:C(fips)-1", data = df))
brkint = np.concatenate(( \
tm[:100,:].reshape(1,100,2),
tm[100:200,:].reshape(1,100,2)
), axis = 0)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(brkint, y_train,
batch_size=2,
epochs=1000,
verbose = 1)
yhat = model.predict(brkint)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
The output shape is the same as for the TCN, and can simply be added element-wise.
Upvotes: 1