Is there a faster alternative to np.where()?

Question

I have a set of 100 data files containing information about particles (ID, velocity, position etc). I need to pick out 10000 specific particles having certain ID numbers from each of them. The way i am doing it is as follows

for i in range(n_files+1):
    data= load_data_file(i, datatype="double_precision")
    for j in chosen_id_arr:
        my_index= np.where((particleID_in_data)==j)
        identity.append(ID[my_index])
        x.append(x_component[my_index])
        y.append(y_component[my_index])
        z.append(z_component[my_index])

The list "chosen_id_array" contains all such IDs. The data files are structured with respect to list index.

This snippet runs very slow for some reason, i was looking for a faster more efficient alternative for this. Thank you very much in advance. :)

David Wierichs · Accepted Answer

Using a dictionary, you could store the positional information attributed to the particle ID, making use of O(1) lookup scaling for dictionaries:

# What the data in a single file would look like:
data = {1:[0.5,0.1,1.], 4:[0.4,-0.2,0.1], ...}
# A lookup becomes very simple syntactically:
for ID in chosen_id_arr:
    x, y, z = data[ID]
    # Here you can process the obtained x,y,z.

This is much faster than the numpy lookup. Regarding the processing of the location data within the loop you could consider to have separate lists of positions for distinct particle IDs but that's not within the scope of the question, I think. The pandas package could also be of help there.

Is there a faster alternative to np.where()?

Answers (1)

Related Questions