Reputation: 592
I have a dataframe with columns x0 x1 x2 x3 x4
and y0 y1 y2 y3 y4
.
First ten rows:
Id x0 x1 x2 x3 x4 y0 y1 y2 y3 y4
0 0 -5.0 -5.0 -5.0 -5.0 -5.0 268035854.2037072 0.94956508069182 3520.7568220782514 -412868933.038522 242572043.87727848
1 1 -5.0 -5.0 -5.0 -5.0 -4.5 268035883.40390667 0.94956508069182 3482.0382462663074 -412868933.038522 242572043.87727848
2 2 -5.0 -5.0 -5.0 -5.0 -4.0 268035901.1170006 0.94956508069182 3443.3196704543634 -412868933.038522 242572043.87727848
3 3 -5.0 -5.0 -5.0 -5.0 -3.5 268035911.8642905 0.94956508069182 3404.6010946424194 -412868933.038522 242572043.87727848
4 4 -5.0 -5.0 -5.0 -5.0 -3.0 268035918.38904288 0.94956508069182 3365.882518830476 -412868933.038522 242572043.87727848
5 5 -5.0 -5.0 -5.0 -5.0 -2.5 268035922.35671327 0.94956508069182 3327.163943018532 -412868933.038522 242572043.87727848
6 6 -5.0 -5.0 -5.0 -5.0 -2.0 268035924.7800574 0.94956508069182 3288.445367206588 -412868933.038522 242572043.87727848
7 7 -5.0 -5.0 -5.0 -5.0 -1.5 268035926.27763835 0.94956508069182 3249.726791394644 -412868933.038522 242572043.87727848
8 8 -5.0 -5.0 -5.0 -5.0 -1.0 268035927.2317166 0.94956508069182 3211.0082155827004 -412868933.038522 242572043.87727848
9 9 -5.0 -5.0 -5.0 -5.0 -0.5 268035927.8858225 0.94956508069182 3172.2896397707564 -412868933.038522 242572043.87727848
I did this:
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
values.shape
I now have shape (4084101, 5)
I would like to have shape (21, 21, 21, 21, 21, 5)
(so that the first shape is x0
, the second x1
, like if we had a 5D graph). Basically, it should be values[1, 0, 0, 0, 0]
to access the tuple (y0, y1, y2, y3, y4)
corresponding to x0=-4.5
, x1=-5
, ..., x4=-5
.
21 because values go from -5 to 5 for the x0, ..., x4
with step 0.5
and 5 because y0, y1, y2, y3, y4
I did values = values.reshape(21, 21, 21, 21, 21, 5)
But when I do values[1][0][0][0][0]
, I expected to have the value corresponding to x1=-4.5, x2=-5, ..., x4=-5
but I don't.
One bad idea that I had (complexity wise) was to make a dictionary in which keys are tuples (x0, x1, x2, x3, x4) and attributes the index where to find the y values.
And then fill a np.zeros((21, 21, 21, 21, 21, 5))
dataframe.
# Get the values
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
# Create a dictionary to map the x0, x1, x2, x3, x4 values to indices
grid = {}
for i, row in df_train.iterrows():
x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row['x0'], row['x1'], row['x2'], row['x3'], row['x4']]]
grid[(x0, x1, x2, x3, x4)] = i
# Create the reshaped array
reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
for key, index in grid.items():
reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]
but it takes almost a minute on my computer ... and looks like the worst idea ever.
Upvotes: 1
Views: 35
Reputation: 120419
Your code works but I think your dataframe is not sorted
df_train = df_train.sort_values(['x0', 'x1', 'x2', 'x3', 'x4'])
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
Upvotes: 2