How do I properly train and predict value like biomass using GRU RNN?

Question

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm trying first with 1 layer GRU. I'm not using softmax for the output layer. MSE is used for my cost function.

It is basic GRU with forward propagation and backward gradient update. Here are the main function I defined:

   'x_t is the input training dataset with a dimension of 7572x8. So T = 7572, input_dim = 8, hidden_dim =128. y_train is my train label.'

   def forward_prop_step(self, x_t,y_train, s_t1_prev,V, U, W, b, c,learning_rate):
       T = x_t.shape[0]
       z_t1 = np.zeros((T,self.hidden_dim))
       r_t1 = np.zeros((T,self.hidden_dim))
       h_t1 = np.zeros((T,self.hidden_dim))
       s_t1 = np.zeros((T+1,self.hidden_dim))
       o_s = np.zeros((T,self.input_dim))
       for i in xrange(T):
           x_e = x_t[i].T
           z_t1[i] = sigmoid(U[0].dot(x_e) + W[0].dot(s_t1[i]) + b[0])#128x1
           r_t1[i] = sigmoid(U[1].dot(x_e) + W[1].dot(s_t1[i]) + b[1])#128x1
           h_t1[i] = np.tanh(U[2].dot(x_e) + W[2].dot(s_t1[i] * r_t1[i]) + b[2])#128x1
           s_t1[i+1] = (np.ones_like(z_t1[i]) - z_t1[i]) * h_t1[i] + z_t1[i] * s_t1[i]#128x1

           o_s[i] = np.dot(V,s_t1[i+1]) + c#8x1
       return [o_s,z_t1,r_t1,h_t1,s_t1]

   def bptt(self, x,y_train,o,z_t1,r_t1,h_t1,s_t1,V, U, W, b, c):
       bptt_truncate = 360
       T = x.shape[0]#length of time scale of input data (train)
       dLdU = np.zeros(U.shape)
       dLdV = np.zeros(V.shape)
       dLdW = np.zeros(W.shape)
       dLdb = np.zeros(b.shape)
       dLdc = np.zeros(c.shape)
       y_train_sp = np.repeat(y_train,self.input_dim)
       for t in np.arange(T)[::-1]:
           dLdy = 2 * (o[t] - y_train_sp[t])
           dydV = s_t1[t]
           dydc = 1.0
           dLdV += np.outer(dLdy,dydV)
           dLdc += dLdy*dydc            
           for i in np.arange(max(0, t-bptt_truncate), t+1)[::-30]:#every month in the past year           
               s_t1_pre = s_t1[i]          
               dydst1 = V #8x128                
               dst1dzt1 = -h_t1[i] + s_t1_pre #128x1
               dst1dht1 = np.ones_like(z_t1[i]) - z_t1[i] #128x1

               dzt1dU = np.outer(z_t1[i]*(1.0-z_t1[i]),x[i]) #128x8
               #print dzt1dU.shape
               dzt1dW = np.outer(z_t1[i]*(1.0-z_t1[i]),s_t1_pre)  #128x128
               dzt1db = z_t1[i]*(1.0-z_t1[i]) #128x1

               dht1dU = np.outer((1.0-h_t1[i] ** 2),x[i]) #128x8
               dht1dW = np.outer((1.0-h_t1[i] ** 2),s_t1_pre * r_t1[i])  #128x128
               dht1db = 1.0-h_t1[i] ** 2 #128x1

               dht1drt1 = (1.0-h_t1[i] ** 2)*(W[2].dot(s_t1_pre))#128x1

               drt1dU = np.outer((r_t1[i]*(1.0-r_t1[i])),x[i]) #128x8
               drt1dW = np.outer((r_t1[i]*(1.0-r_t1[i])),s_t1_pre) #128x128
               drt1db = (r_t1[i]*(1.0-r_t1[i]))#128x1
               dLdW[0] += np.outer(dydst1.T.dot(dLdy),dzt1dW.dot(dst1dzt1)) #128x128
               dLdU[0] += np.outer(dydst1.T.dot(dLdy),dst1dzt1.dot(dzt1dU)) #128x8
               dLdb[0] += (dydst1.T.dot(dLdy))*dst1dzt1*dzt1db#128x1

               dLdW[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dW)#128x128
               dLdU[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dU) #128x8
               dLdb[1] += (dydst1.T.dot(dLdy))*dst1dht1*dht1drt1*drt1db#128x1

               dLdW[2] += np.outer(dydst1.T.dot(dLdy),dht1dW.dot(dst1dht1))  #128x128
               dLdU[2] += np.outer(dydst1.T.dot(dLdy),dst1dht1.dot(dht1dU))#128x8
               dLdb[2] += (dydst1.T.dot(dLdy))*dst1dht1*dht1db#128x1

       return [ dLdV,dLdU, dLdW, dLdb, dLdc ]
   def predict( self, x): 
       pred = np.amax(x, axis = 1)
       pred_f = relu(pred)
       return pred_f

Parameters V,U,W,b,c are updated by gradient dLdV,dLdU,dLdW,dLdb,dLdc calculated by bptt.

I have tried different weight initialization (xavier or just random), tried different time truncation. But all lead to the same outcome. Probably the weight update wasn't right? The network set-up seems simple though. Really struggle on understanding the predication and translate to actual biomass too. The function predict is what I defined to translate the output layer from the GRU network to biomass value by taking the maximum value. But the output layer gives similar value for almost all time iterations. Not sure the best way to do the job though. Thanks for any help or suggestions in advance.

How do I properly train and predict value like biomass using GRU RNN?

Answers (1)

Related Questions