user845888
user845888

Reputation:

numpy array converted to pandas dataframe drops values

I need to calculate statistics for each node of a 2D grid. I figured the easy way to do this was to take the cross join (AKA cartesian product) of two ranges. I implemented this using numpy as this function:

def node_grid(x_range, y_range, x_increment, y_increment):
    x_min = float(x_range[0])
    x_max = float(x_range[1])
    x_num = (x_max - x_min)/x_increment + 1
    y_min = float(y_range[0])
    y_max = float(y_range[1])
    y_num = (y_max - y_min)/y_increment + 1

    x = np.linspace(x_min, x_max, x_num)
    y = np.linspace(y_min, y_max, y_num)

    ng = list(product(x, y))
    ng = np.array(ng)
    return ng, x, y

However when I convert this to a pandas dataframe it drops values. For example:

In [2]: ng = node_grid(x_range=(-60, 120), y_range=(0, 40), x_increment=0.1, y_increment=0.1)
In [3]: ng[0][(ng[0][:,0] > -31) & (ng[0][:,0] < -30) & (ng[0][:,1]==10)]
Out[3]: array([[-30.9,  10. ],
   [-30.8,  10. ],
   [-30.7,  10. ],
   [-30.6,  10. ],
   [-30.5,  10. ],
   [-30.4,  10. ],
   [-30.3,  10. ],
   [-30.2,  10. ],
   [-30.1,  10. ]])

In [4]: node_df = pd.DataFrame(ng[0])
node_df.columns = ['xx','depth']
print(node_df[(node_df.depth==10) & node_df.xx.between(-30,-31)])
Out[4]:Empty DataFrame
Columns: [xx, depth]
Index: []

The dataframe isn't empty:

In [5]: print(node_df.head())
Out[5]:      xx  depth
0 -60.0    0.0
1 -60.0    0.1
2 -60.0    0.2
3 -60.0    0.3
4 -60.0    0.4

values from the numpy array are being dropped when they are being put into the pandas array. Why?

Upvotes: 2

Views: 222

Answers (2)

scottD
scottD

Reputation: 16

the "between" function demands that the first argument be less than the latter.

In: print(node_df[(node_df.depth==10) & node_df.xx.between(-31,-30)]) xx depth 116390 -31.0 10.0 116791 -30.9 10.0 117192 -30.8 10.0 117593 -30.7 10.0 117994 -30.6 10.0 118395 -30.5 10.0 118796 -30.4 10.0 119197 -30.3 10.0 119598 -30.2 10.0 119999 -30.1 10.0 120400 -30.0 10.0

For clarity the product() function used comes from the itertools package, i.e., from itertools import product

Upvotes: 0

Ascurion
Ascurion

Reputation: 523

I can't fully reproduce your code.

But I find the problem is that you have to turn the lower and upper boundaries around in the between query. The following works for me:

print(node_df[(node_df.depth==10) & node_df.xx.between(-31,-30)])

when using:

ng = np.array([[-30.9,  10. ],
                [-30.8,  10. ],
                [-30.7,  10. ],
                [-30.6,  10. ],
                [-30.5,  10. ],
                [-30.4,  10. ],
                [-30.3,  10. ],
                [-30.2,  10. ],
                [-30.1,  10. ]])
node_df = pd.DataFrame(ng)

Upvotes: 0

Related Questions