bhjghjh
bhjghjh

Reputation: 917

Creating a new 2 column numpy array from filtering through the first coumn/array

I am trying to create a new 2 dimensional or 2 column array, which will consist of (data value <=20000) from the first column, and their associated ID values in the second column. Mathematically I am doing the following: I am reading data from a text file. I am finding distance to all the points from the last point.

# ID M1 M2 M3 M4 R4 M5 R5 x y z
10217 11.467 11.502 13.428 13.599  432.17 13.266  281.06 34972.8 42985.9 14906
7991 11.529 11.559 13.438 13.520  435.23 13.224  272.23 8538.05 33219.8 43375.1
2100 11.526 11.573 13.478 13.490  448.97 13.356  301.27 9371.75 13734.1 43398.6
9467 11.557 11.621 13.481 13.537  449.99 13.367  303.67 33200.3 36008.9 12735.8
4002 11.454 11.530 13.502 13.583  457.34 13.327  294.53 44607.2 10410.9 9090
2971 11.475 11.563 13.506 13.558  458.77 13.391  309.43 29818.3 98.65 11718.6
1243 11.538 11.581 13.509 13.513  459.62 13.377  306.09 16238.4 11067.9 25048
9953 11.523 11.544 13.559 13.913  477.72 13.440  321.20 34589.6 42869 14878.6
7411 11.547 11.576 13.610 13.658  496.81 13.479  330.96 31436 42092.8 12307.8
1820 11.606 11.619 13.652 12.543  513.11 13.571  355.21 1758.75 15809.8 40473.6
2792 11.647 11.679 13.744 13.877  550.82 13.643  375.38 24393 6774.8 8346.35
510 11.687 11.717 13.771 13.810  562.27 13.642  375.14 22340.3 9316.4 13209.9
1721 11.602 11.646 13.821 14.139  584.37 13.770  413.84 2144.95 15769.1 40470.1

After I get the distances, I only want to take distances<=20,000 from my calculations and also their associated ID column.

So far I wrote this code to return calculated distances and IDs:

# Find nearest neighbors 

import numpy as np
import matplotlib.pyplot as plt



halo = 'nntest.txt'
ID, m,r,x,y,z= np.loadtxt(halo, usecols=(0,6,7,8,9,10), unpack =True)



# selet the last point
m_mass = m[-1:]
ID_mass = ID[-1:]
r_mass = r[-1:]
x_mass = x[-1:]
y_mass = y[-1:]
z_mass = z[-1:]

#######################################
#Find distance to all points from our targeted point
nearest_neighbors = []

def neighbors(ID_mass, cx,cy,cz, ID, x, y, z):

    dist = np.sqrt((cx-x)**2 + (cy-y)**2 + (cz-z)**2)

    return dist, ID


for i in range(len(ID_mass)):
    hist = neighbors(ID_mass[i], x_mass[i], y_mass[i], z_mass[i], ID, x, y, z)
    print hist

    #print all the IDs which are associated with dist<=20000
    if (hist[0]<=20000):
        print ID
    nearest_neighbors.append(hist)



print nearest_neighbors

But I am having problem returning the new array, which will only contain distances<=20000, and associated IDs. I apologize in advance if this is not a good working example. But I will very much appreciate your suggestion to get that desired output.

Upvotes: 2

Views: 84

Answers (1)

Tyler Moncur
Tyler Moncur

Reputation: 26

Between the question you asked, and the code you have provided, I am still somewhat unclear on what you what to accomplish. But I can at least show you where there are errors in the code, and perhaps give you the tools you need.

As your code is now, x, y, z are all vectors. So the result of the neighbors distance calculation,

dist = np.sqrt((cx-x)**2 + (cy-y)**2 + (cz-z)**2)

will be a vector. I think this is what you intended since the other values are indexed. But this means you run into trouble with

if (hist[0]<=20000):
    print ID

Numpy will treat the inequality as a mask, so hist[0]<=2000 will look something like [True, False, False, ...]. Used properly, I think that numpy array masks are perfect for what you want. For example, you could try

for i in range(len(ID_mass)):
    hist = neighbors(ID_mass[i], x_mass[i], y_mass[i], z_mass[i], ID, x, y, z)
    print hist

    #print all the IDs which are associated with dist<=20000
    print ID[hist[0]<=20000]
    nearest_neighbors.extend(list(zip(hist[0][hist[0]<=20000],ID[hist[0]<=20000])))

print nearest_neighbors

This line where we extend the nearest_neighbors list is a bit of a mess, and I may not have fully understood what you want the output to look like. But this will make a list of tuples, where each tuple contains the distance value and the ID for all of the cases where distance was less than 20000.

Upvotes: 1

Related Questions