Astrid
Astrid

Reputation: 1936

From scatter plot to 2D array

My mind has gone completely blank on this one.

I want to do what I think is very simple.

Suppose I have some test data:

import pandas as pd
import numpy as np
k=10
df = pd.DataFrame(np.array([range(k), 
                           [x + 1 for x in range(k)],
                           [x + 4 for x in range(k)], 
                           [x + 9 for x in range(k)]]).T,columns=list('abcd'))

where rows correspond to time and columns to angles, and it looks like this:

   a   b   c   d
0  0   1   4   9
1  1   2   5  10
2  2   3   6  11
3  3   4   7  12
4  4   5   8  13
5  5   6   9  14
6  6   7  10  15
7  7   8  11  16
8  8   9  12  17
9  9  10  13  18

Then for reasons I convert it to and ordered dictionary:

def highDimDF2Array(df):
    from collections import OrderedDict # Need to preserve order

    vels = [1.42,1.11,0.81,0.50]

    # Get dataframe shapes
    cols = df.columns

    trajectories = OrderedDict()
    for i,j in enumerate(cols):
        x = df[j].values
        x = x[~np.isnan(x)]

        maxTimeSteps = len(x)
        tmpTraj = np.empty((maxTimeSteps,3))
        # This should be fast
        tmpTraj[:,0] = range(maxTimeSteps) 
        # Remove construction nans
        tmpTraj[:,1] = x
        tmpTraj[:,2].fill(vels[i])

        trajectories[j] = tmpTraj

    return trajectories

Then I plot it all

import matplotlib.pyplot as plt
m = highDimDF2Array(df)
M = np.vstack(m.values())
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.show()

enter image description here

Now all I want to do is to put all of that into a 2D numpy array with the properties:

In 3D the colour would correspond to the height.

I was thinking of using something like this: 3d Numpy array to 2d but am not quite sure how.

Upvotes: 1

Views: 1942

Answers (2)

dnalow
dnalow

Reputation: 984

I don't use pandas, so I cannot really follow what your function does. But from the description of your array M and what you want, I think the funktion np.histogram2d is what you want. It bins the range of your independent values in equidistant steps and sums all the occurrences. You can apply weighting with your 3rd column to get the proper height. You have to choose the number of bins:

z, x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=50)
num, x, y = np.histogram2d(M[:,0], M[:,1], bins=50)

z /= num # proper averaging, it also gives you NaN where num==0

plt.pcolor(x, y, z) #visualization

Also plt.hist2d could be interesting

edit: The histogram2d yields the 2D array which was asked for in the question. The visualization, however, should be done with imshow, since pcolor doesn't skip NaN values (is there some way to teach it?)

The advantage of this method is that the x,y values can be float and of arbitrary order. Further, by defining the number of bins, one can choose the resolution of the resulting image. Nevertheless, to get exactly the result which was asked for, one should do:

binx = np.arange(M[:,0].min()-0.5, M[:,0].max()+1.5) # edges of the bins. 0.5 is the half width
biny = np.arange(M[:,1].min()-0.5, M[:,1].max()+1.5)

z,   x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=(binx,biny))
num, x, y   = np.histogram2d(M[:,0], M[:,1], bins=(binx,biny))

z /= num


plt.imshow(z.T, interpolation='none', origin = 'lower')

enter image description here

the output of pcolor doesn't leave out the nans but therefore takes also x and y values into account:

plt.pcolormesh(x, y, z.T, vmin=0, vmax=2)

enter image description here

Upvotes: 1

Molly
Molly

Reputation: 13610

You can convert the values in M[:,1] and M[:,2] to integers and use them as indices to a 2D numpy array. Here's an example using the value for M you defined.

out = np.empty((20,10))
out[:] = np.NAN
N = M[:,[0,1]].astype(int)
out[N[:,1], N[:,0]] = M[:,2]
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.imshow(out, interpolation='none', origin = 'lower')

enter image description here

Here you can convert M to integers directly but you might have to come up with a function to map the columns of M to integers depending on the resolution of the array you are creating.

Upvotes: 2

Related Questions