joechoj
joechoj

Reputation: 1409

How to structure multiple python arrays for sorting

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.

How best to structure the data so I can sort by descending amplitude?

When testing with just one data field, I was able to use a dict as follows:

fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]

My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)

Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.

Here's a snippet of my data:

fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0.        ,  0.00246951,  0.00493902,  0.00740854,  0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])

The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)

Upvotes: 0

Views: 127

Answers (2)

kyjanond
kyjanond

Reputation: 458

If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).

a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]

#numpy solution
import numpy as np

my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]

#vanilla python solution
from operator import itemgetter

my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))

You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()

EDIT:

To get first n values only, you have to simply slice the array similarly to what @Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:

my_sliced_sorted_array = my_sorted_array[:,-2:]

result will be:

array([[15, 13],
       [ 3,  4],
       [24, 20]])

Hope it helps.

Upvotes: 0

Paul Panzer
Paul Panzer

Reputation: 53029

You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:

order = np.argsort(amplitudes)

This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?

The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore

top5 = order[:-6:-1]

Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing

freq_bin[top5]
amplitudes[top5]
wavelength[top5]

If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:

np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]

Upvotes: 1

Related Questions