Reputation: 1936
My mind has gone completely blank on this one.
I want to do what I think is very simple.
Suppose I have some test data:
import pandas as pd
import numpy as np
k=10
df = pd.DataFrame(np.array([range(k),
[x + 1 for x in range(k)],
[x + 4 for x in range(k)],
[x + 9 for x in range(k)]]).T,columns=list('abcd'))
where rows correspond to time and columns to angles, and it looks like this:
a b c d
0 0 1 4 9
1 1 2 5 10
2 2 3 6 11
3 3 4 7 12
4 4 5 8 13
5 5 6 9 14
6 6 7 10 15
7 7 8 11 16
8 8 9 12 17
9 9 10 13 18
Then for reasons I convert it to and ordered dictionary:
def highDimDF2Array(df):
from collections import OrderedDict # Need to preserve order
vels = [1.42,1.11,0.81,0.50]
# Get dataframe shapes
cols = df.columns
trajectories = OrderedDict()
for i,j in enumerate(cols):
x = df[j].values
x = x[~np.isnan(x)]
maxTimeSteps = len(x)
tmpTraj = np.empty((maxTimeSteps,3))
# This should be fast
tmpTraj[:,0] = range(maxTimeSteps)
# Remove construction nans
tmpTraj[:,1] = x
tmpTraj[:,2].fill(vels[i])
trajectories[j] = tmpTraj
return trajectories
Then I plot it all
import matplotlib.pyplot as plt
m = highDimDF2Array(df)
M = np.vstack(m.values())
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.show()
Now all I want to do is to put all of that into a 2D numpy array with the properties:
NaNs
(i.e. those that are undefined by a point in the scatter plot)In 3D the colour would correspond to the height.
I was thinking of using something like this: 3d Numpy array to 2d but am not quite sure how.
Upvotes: 1
Views: 1942
Reputation: 984
I don't use pandas, so I cannot really follow what your function does. But from the description of your array M and what you want, I think the funktion np.histogram2d is what you want. It bins the range of your independent values in equidistant steps and sums all the occurrences. You can apply weighting with your 3rd column to get the proper height. You have to choose the number of bins:
z, x, y = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=50)
num, x, y = np.histogram2d(M[:,0], M[:,1], bins=50)
z /= num # proper averaging, it also gives you NaN where num==0
plt.pcolor(x, y, z) #visualization
Also plt.hist2d
could be interesting
edit: The histogram2d yields the 2D array which was asked for in the question. The visualization, however, should be done with imshow, since pcolor doesn't skip NaN values (is there some way to teach it?)
The advantage of this method is that the x,y values can be float and of arbitrary order. Further, by defining the number of bins, one can choose the resolution of the resulting image. Nevertheless, to get exactly the result which was asked for, one should do:
binx = np.arange(M[:,0].min()-0.5, M[:,0].max()+1.5) # edges of the bins. 0.5 is the half width
biny = np.arange(M[:,1].min()-0.5, M[:,1].max()+1.5)
z, x, y = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=(binx,biny))
num, x, y = np.histogram2d(M[:,0], M[:,1], bins=(binx,biny))
z /= num
plt.imshow(z.T, interpolation='none', origin = 'lower')
the output of pcolor doesn't leave out the nans but therefore takes also x and y values into account:
plt.pcolormesh(x, y, z.T, vmin=0, vmax=2)
Upvotes: 1
Reputation: 13610
You can convert the values in M[:,1] and M[:,2] to integers and use them as indices to a 2D numpy array. Here's an example using the value for M you defined.
out = np.empty((20,10))
out[:] = np.NAN
N = M[:,[0,1]].astype(int)
out[N[:,1], N[:,0]] = M[:,2]
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.imshow(out, interpolation='none', origin = 'lower')
Here you can convert M to integers directly but you might have to come up with a function to map the columns of M to integers depending on the resolution of the array you are creating.
Upvotes: 2