Reputation: 1765
I have data with the shape of (3000, 4), the features are (product, store, week, quantity). Quantity is the target.
So I want to reconstruct this matrix to a tensor, without blending the corresponding quantities.
For example, if there are 30 product, 20 stores and 5 weeks, the shape of the tensor should be (5, 20, 30), with the corresponding quantity. Because there won't be an entry like (store A, product X, week 3) twice in entire data, so every store x product x week pair should have one corresponding quantity.
Any suggestions about how to achieve this, or there is any logical error? Thanks.
Upvotes: 1
Views: 807
Reputation: 18628
If there is no dummies, you just have to carefully sort our data. np.lexsort can do it.
Suppose your data looks like data
:
import numpy as np
dims=a,b,c=30,20,5
data=np.array(list(product(*[np.arange(i) for i in dims+(1,)])))
data[:,-1]=np.random.randint(0,100,a*b*c)
np.random.shuffle(data)
#array([[ 4, 15, 0, 56],
# [27, 16, 2, 3],
# [ 4, 8, 4, 26],
# ...,
# [20, 14, 3, 28],
# [14, 10, 4, 6],
# [19, 14, 3, 39]])
You can then sort if necessary and reshape like this:
sorteddata=data[np.lexsort(data[:,::-1].T)]
tensor=sorteddata[:,-1].reshape(dims)
now tensor[4,15,0]
is 56
. ok !
Upvotes: 2
Reputation: 4547
You can first go through each of your first three columns and count the number of different products, stores and weeks that you have. This will give you the shape of your new array, which you can create using numpy. Importantly now, you need to create a conversion matrix for each category. For example, if product is 'XXX', then you want to know to which row of the first dimension (as product is the first dimension of your array) 'XXX' corresponds; same idea for store and week. Once you have all of this, you can simply iterate through all lines of your existing array and assign the value of quantity to the correct location inside your new array based on the indices stored in your conversion matrices for each value of product, store and week. As you said, it makes sense because there is a one-to-one correspondence.
Upvotes: 0