How to reshape a vector to TensorFlow's filters?

I want to transfer some weights trained by another network to TensorFlow, the weights are stored in a single vector like this:


By using numpy, I can reshape it to two 3 by 3 filters like this:

1 2 3     9  10 11
3 4 5     12 13 14
6 7 8     15 16 17

Thus, the shape of my filters are (1,2,3,3). However, in TensorFlow, the shape of filters are (3,3,2,1):

tf_weights = tf.Variable(tf.random_normal([3,3,2,1]))

After reshaping the tf_weights to the expected shape, the weight becomes a mess and I can't get the expected convolution result.

To be specific, when the shape of an image or filter is [number,channel,size,size], I wrote a convolution function and it gives the correct answer,but it's too slow:

def convol(images,weights,biases,stride):
      images:input images or features, 4-D tensor
      weights:weights, 4-D tensor
      biases:biases, 1-D tensor
      stride:stride, a float number
      conv_feature: convolved feature map
    image_num = images.shape[0] #the number of input images or feature maps
    channel = images.shape[1] #channels of an image,images's shape should be like [n,c,h,w]
    weight_num = weights.shape[0] #number of weights, weights' shape should be like [n,c,size,size]
    ksize = weights.shape[2]
    h = images.shape[2]
    w = images.shape[3]
    out_h = (h+np.floor(ksize/2)*2-ksize)/2+1
    out_w = out_h

    conv_features = np.zeros([image_num,weight_num,out_h,out_w])
    for i in range(image_num):
        image = images[i,...,...,...]
        for j in range(weight_num):
            sum_convol_feature = np.zeros([out_h,out_w])
            for c in range(channel):
                #extract a single channel image
                channel_image = image[c,...,...]
                #pad the image
                padded_image = im_pad(channel_image,ksize/2)
                #transform this image to a vector
                im_col = im2col(padded_image,ksize,stride)

                weight = weights[j,c,...,...]
                weight_col = np.reshape(weight,[-1])
                mul =,weight_col)
                convol_feature = np.reshape(mul,[out_h,out_w])
                sum_convol_feature = sum_convol_feature + convol_feature
            conv_features[i,j,...,...] = sum_convol_feature + biases[j]
    return conv_features

Instead, by using tensorflow's conv2d like this:

img = np.zeros([1,3,224,224])
img = img - 1
img = np.rollaxis(img, 1, 4)

weight_array = googleNet.layers[1].weights
weight_array = np.reshape(weight_array,[64,3,7,7])

biases_array = googleNet.layers[1].biases

tf_weight = tf.Variable(weight_array)

tf_img = tf.Variable(img)
tf_img = tf.cast(tf_img,tf.float32)

tf_biases = tf.Variable(biases_array)

conv_feature = tf.nn.bias_add(tf.nn.conv2d(tf_img,tf_weight,strides=[1,2,2,1],padding='SAME'),tf_biases)
sess = tf.Session()
feautre =

The feature map I got is wrong.

Answers (2)


Sample Tensor Manipulations

I dont know if this might be of help. Consider the Reshape ,Gather, Dynamic_partition and Split operations and adapt this to your needs. In what comes below is the illustration of these operations that can be adapted to use in your situation. I copied this from my git repo. I will believe if you run this examples in ipython you can figure out what you really want and get even better insight.

Reshape ,Gather, Dynamic_partition and Split

Gather Operation ( tf.gather( ) )

Generate an array and test the gather operation. Note this approach for fast prototyping:

  • We generate an array in Numpy and test the operations of tensor flow on it.

Use: Gather slices from params according to indices.

indices must be an integer tensor of any dimension (usually 0-D or 1-D). This is best illustrated by an example:

array = np.array([[1,2,3],[4,9,6],[2,3,4],[7,8,0]])


(4, 3)

In [27]:

gather_output0  = tf.gather(array,1)
gather_output01  = tf.gather(array,2)
gather_output02  = tf.gather(array,3)

gather_output11  = tf.gather(array,[1,2])
gather_output12  = tf.gather(array,[1,3])
gather_output13  = tf.gather(array,[3,2])

gather_output  = tf.gather(array,[1,0,2])
gather_output1  = tf.gather(array,[1,1,2])
gather_output2  = tf.gather(array,[1,2,1])

In [28]:

with tf.Session() as sess:
    print (gather_output0.eval());print("\n")
    print (gather_output01.eval());print("\n")
    print (gather_output02.eval());print("\n")  
    print (gather_output11.eval());print("\n")
    print (gather_output12.eval());print("\n")
    print (gather_output13.eval());print("\n")

    print (gather_output.eval());print("\n")
    print (gather_output1.eval());print("\n")
    print (gather_output2.eval());print("\n")
    #print (gather_output2.eval());print("\n")

[4 9 6]

[2 3 4]

[7 8 0]

[[4 9 6]
 [2 3 4]]

[[4 9 6]
 [7 8 0]]

[[7 8 0]
 [2 3 4]]

[[4 9 6]
 [1 2 3]
 [2 3 4]]

[[4 9 6]
 [4 9 6]
 [2 3 4]]

[[4 9 6]
 [2 3 4]
 [4 9 6]]

And looking at this simple example:

  • Initialise simple array
  • test gather operation

    In [11]:

    array_simple = np.array([1,2,3])
    In [15]:
    print "shape of simple array is: ", array_simple.shape
    shape of simple array is:  (3,)
    In [57]:
    gather1  = tf.gather(array1,[0])
    gather01 = tf.gather(array1,[1])
    gather02 = tf.gather(array1,[2])
    gather2 = tf.gather(array1,[1,2])
    gather3 = tf.gather(array1,[0,1])
    with tf.Session() as sess:
        print (gather1.eval());print("\n")
        print (gather01.eval());print("\n")
        print (gather02.eval());print("\n")
        print (gather2.eval());print("\n")
        print (gather3.eval());print("\n")
    [2 3]
    [1 2]
    tf.reshape( )
    *  Use the same array that was initiated
    *  Do reshape using tf.reshape( )
    In [64]:
    array.shape # Confirm array shape
    (4, 3)
    In [74]:
    print ("This is the array\n" ,array) # see the output and compare with the initial array,
    This is the array
     [[1 2 3]
     [4 9 6]
     [2 3 4]
     [7 8 0]]
    In [84]:
    reshape_ops= tf.reshape(array,[-1,4]) # Note the parameters in reshpe
    reshape_ops1= tf.reshape(array,[-1,3]) # Note the parameters in reshpe
    reshape_ops2= tf.reshape(array,[-1,6]) # Note the parameters in reshpe
    reshape_ops_back1= tf.reshape(array,[6,-1]) # Note the parameters in reshpe
    reshape_ops_back2= tf.reshape(array,[3,-1]) # Note the parameters in reshpe
    reshape_ops_back3= tf.reshape(array,[4,-1]) # Note the parameters in reshpe
    In [86]:
    with tf.Session() as sess:
        print ("Output when we reverse the parameters:");print("\n")
    [[1 2 3 4]
     [9 6 2 3]
     [4 7 8 0]]
    [[1 2 3]
     [4 9 6]
     [2 3 4]
     [7 8 0]]
    [[1 2 3 4 9 6]
     [2 3 4 7 8 0]]
    Output when we reverse the parameters:
    [[1 2]
     [3 4]
     [9 6]
     [2 3]
     [4 7]
     [8 0]]
    [[1 2 3 4]
     [9 6 2 3]
     [4 7 8 0]]
    [[1 2 3]
     [4 9 6]
     [2 3 4]
     [7 8 0]]

    Note: The input size and output size must be the same. ---otherwise it gives error. Simple way to check this out is to make sure the input can be paritioned into the the reshape parameters by doing simple multiplications.


This is declared as :

tf.dynamic_partition (array, partitions, num_partitions, name=None)


* we decalare number_partitions --- number of partitions
* Use our array initialised earlier
* We declare the partition as [0 1 0 1] . This signifies the partitions we want 0's fall to one partition and 1 the other partitions given that we have two num_partitions=2.

* The output is a list

In [96]:

    print ("This is the array\n" ,array) # This is output array

    This is the array
     [[1 2 3]
     [4 9 6]
     [2 3 4]
     [7 8 0]]

    We show how to make two and three partitions below
    In [123]:

    num_partitions = 2
    num_partitions1 = 3

    partitions = [0, 0, 1, 1]
    partitions1 = [0 ,1 ,1, 2 ]

    In [119]:

    dynamic_ops =tf.dynamic_partition(array, partitions, num_partitions, name=None) # 2 partitions
    dynamic_ops1 =tf.dynamic_partition(array, partitions1, num_partitions1, name=None) # 3 partitions

    In [125]:

    with tf.Session() as sess:
        run =
        run1 =
        print("Output for 2 partitions: ")
        print (run[0]);print("\n")
        print(run[1]) ;print("\n")# Compare result with initial array. Out is list
        print("Output for three partitions: ")

        print (run1[0]);print("\n")
        print (run1[1]);print("\n")
        print (run1[2]);print("\n")

    Output for 2 partitions: 
    [[1 2 3]
     [4 9 6]]

    [[2 3 4]
     [7 8 0]]

    Output for three partitions: 
    [[1 2 3]]

    [[4 9 6]
     [2 3 4]]

    [[7 8 0]]

tf.split( )

Make sure you use an up to date tensorflow version. Otherwise in older versions, this implemetation will give error

This is specified in the documentation as below:

tf.split(value, num_or_size_splits, axis=0, num=None, name='split').

It splits a tensor into subtensors. This is best illustrated by an example:

* we define (5,30) aray in numpy
* we split the array along axis 1
* We  specify the number of splits as 1-Dimen Tensor along axis 1. So we have 3 splits.

Specify an array

    Create a (5 by 30) numpy array. The syntax using numpy is shown below
    In [2]:

    ArrayBeforeSplitting = np.arange(150).reshape(5,30) 
    print ("Array shape without split operation is : " ,ArrayBeforeSplitting.shape)

    ('Array shape without split operation is : ', (5, 30))

    specify number of splits
    In [3]:

    split_1D = tf.Variable([8,13,9])
    print("specify number of partions using 1-Dimen Variable:" , tf.shape(split_1D))

    ('specify number of partions using 1-Dimen Variable:', <tf.Tensor 'Shape:0' shape=(1,) dtype=int32>)

    Use tf.split

    Make 3 splits aong y axis so that we have (5,8) ,(5,13),(5,9) splits. The axis 1 add up to give 30-- we can see axis 1 has 30 elements so the partition along that axis should add up to 30 otherwise it gives error.
    In [6]:

    split1,split2,split3 = tf.split(ArrayBeforeSplitting,split_1D,1)
    # we have 3 splits along axis 1 specified spcifically
    # by the split_1D . That is split axis 1D (with 30 elements) into partions with 8 ,13, and 9 elements while the x axis
    #remains constant

    In [7]:

    #INitialise global variables. because split_ID is a variable and needs to be initialised before being
    #used in a computational graph
    init_op = tf.global_variables_initializer()

    In [16]:

    with tf.Session() as sess: # run variable initialisation.
        print("the shape of the first split operation is : ",result.shape)
        print("the shape of the second split operation is : ",result2.shape)

        print("the shape of the third split operation is : ",result3.shape)

    [[  0   1   2   3   4   5   6   7]
     [ 30  31  32  33  34  35  36  37]
     [ 60  61  62  63  64  65  66  67]
     [ 90  91  92  93  94  95  96  97]
     [120 121 122 123 124 125 126 127]]
    ('the shape of the first split operation is : ', (5, 8))

    [[  8   9  10  11  12  13  14  15  16  17  18  19  20]
     [ 38  39  40  41  42  43  44  45  46  47  48  49  50]
     [ 68  69  70  71  72  73  74  75  76  77  78  79  80]
     [ 98  99 100 101 102 103 104 105 106 107 108 109 110]
     [128 129 130 131 132 133 134 135 136 137 138 139 140]]
    ('the shape of the second split operation is : ', (5, 13))

Hope this helps!

Don't use np.reshape. It might mess up the order of your values.

Use np.rollaxis instead:

>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18])
>>> a = a.reshape((1,2,3,3))
>>> a
array([[[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]]])
>>> b = np.rollaxis(a, 1, 4)
>>> b.shape
(1, 3, 3, 2)
>>> b = np.rollaxis(b, 0, 4)
>>> b.shape
(3, 3, 2, 1)

Note that the order of the two axes with size 3 haven't changed. If I were to label them, the two rollaxis operations caused the shapes to change as (1, 2, 31, 32) -> (1, 31, 32, 2) -> (31, 32, 2, 1). Your final array looks like:

>>> b
array([[[[ 1],

        [[ 2],

        [[ 3],

       [[[ 4],

        [[ 5],

        [[ 6],

       [[[ 7],

        [[ 8],

        [[ 9],

