Mapping a pandas dataframe to a n-dimensional array, where each dimension corresponds to one of the x columns

Question

I have a dataframe with columns x0 x1 x2 x3 x4 and y0 y1 y2 y3 y4.

First ten rows:

    Id  x0  x1  x2  x3  x4  y0  y1  y2  y3  y4
0   0   -5.0    -5.0    -5.0    -5.0    -5.0    268035854.2037072   0.94956508069182    3520.7568220782514  -412868933.038522   242572043.87727848
1   1   -5.0    -5.0    -5.0    -5.0    -4.5    268035883.40390667  0.94956508069182    3482.0382462663074  -412868933.038522   242572043.87727848
2   2   -5.0    -5.0    -5.0    -5.0    -4.0    268035901.1170006   0.94956508069182    3443.3196704543634  -412868933.038522   242572043.87727848
3   3   -5.0    -5.0    -5.0    -5.0    -3.5    268035911.8642905   0.94956508069182    3404.6010946424194  -412868933.038522   242572043.87727848
4   4   -5.0    -5.0    -5.0    -5.0    -3.0    268035918.38904288  0.94956508069182    3365.882518830476   -412868933.038522   242572043.87727848
5   5   -5.0    -5.0    -5.0    -5.0    -2.5    268035922.35671327  0.94956508069182    3327.163943018532   -412868933.038522   242572043.87727848
6   6   -5.0    -5.0    -5.0    -5.0    -2.0    268035924.7800574   0.94956508069182    3288.445367206588   -412868933.038522   242572043.87727848
7   7   -5.0    -5.0    -5.0    -5.0    -1.5    268035926.27763835  0.94956508069182    3249.726791394644   -412868933.038522   242572043.87727848
8   8   -5.0    -5.0    -5.0    -5.0    -1.0    268035927.2317166   0.94956508069182    3211.0082155827004  -412868933.038522   242572043.87727848
9   9   -5.0    -5.0    -5.0    -5.0    -0.5    268035927.8858225   0.94956508069182    3172.2896397707564  -412868933.038522   242572043.87727848

I did this:

values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values
values.shape

I now have shape (4084101, 5)

I would like to have shape (21, 21, 21, 21, 21, 5) (so that the first shape is x0, the second x1, like if we had a 5D graph). Basically, it should be values[1, 0, 0, 0, 0] to access the tuple (y0, y1, y2, y3, y4) corresponding to x0=-4.5, x1=-5, ..., x4=-5.

21 because values go from -5 to 5 for the x0, ..., x4 with step 0.5 and 5 because y0, y1, y2, y3, y4 I did values = values.reshape(21, 21, 21, 21, 21, 5) But when I do values[1][0][0][0][0], I expected to have the value corresponding to x1=-4.5, x2=-5, ..., x4=-5 but I don't.

One bad idea that I had (complexity wise) was to make a dictionary in which keys are tuples (x0, x1, x2, x3, x4) and attributes the index where to find the y values. And then fill a np.zeros((21, 21, 21, 21, 21, 5)) dataframe.

# Get the values
values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values

# Create a dictionary to map the x0, x1, x2, x3, x4 values to indices
grid = {}
for i, row in df_train.iterrows():
    x0, x1, x2, x3, x4 = [int((x + 5) / 0.5) for x in [row['x0'], row['x1'], row['x2'], row['x3'], row['x4']]]
    grid[(x0, x1, x2, x3, x4)] = i

# Create the reshaped array
reshaped_values = np.zeros((21, 21, 21, 21, 21, 5))
for key, index in grid.items():
    reshaped_values[key[0]][key[1]][key[2]][key[3]][key[4]] = values[index]

but it takes almost a minute on my computer ... and looks like the worst idea ever.

Corralien · Accepted Answer

Your code works but I think your dataframe is not sorted

df_train = df_train.sort_values(['x0', 'x1', 'x2', 'x3', 'x4'])

values = df_train[['y0', 'y1', 'y2', 'y3', 'y4']].values

Mapping a pandas dataframe to a n-dimensional array, where each dimension corresponds to one of the x columns

Answers (1)

Related Questions