user10044922
user10044922

Reputation:

defining a for loop for operations on each row of a NumPy array

I have a data set of length (L) which I named "data".

data=raw_data.iloc[:,0]

I randomly generated 2000 sample series from "data" and named it "resamples" to have a NumPy matrix of len =2000 and cols=L of the "data".

resamples=[np.random.choice(data, size=len(data), replace=True) for i in range (2000)]

The code below shows two operations in Scipy.stats using "data" which is a single array. Now I need to perform the same operation on each one of those sample series (2000 rows) by defining a for loop. The challenge is two parameters (loc and scale) are calculated in the first step and they should be used for each row to perform the next one. My knowledge falls short in defining such a for loop. I was wondering if anyone could help me with this.

loc, scale=stats.gumbel_r.fit(data)

return_gumbel=stats.gumbel_r.ppf([0.9999,0.9995,0.999],loc=loc, scale=scale)

Upvotes: 0

Views: 204

Answers (1)

hpaulj
hpaulj

Reputation: 231375

The description is a little unclear, but I think you just need:

alist = []
for data in resamples:
   loc, scale=stats.gumbel_r.fit(data)
   return_gumbel=stats.gumbel_r.ppf([0.9999,0.9995,0.999],loc=loc, scale=scale)
   alist.append(return_gumbel)
arr = np.array(alist)

You could also create arr first, and assign return_gumbel to the respective rows, but the list append is about the same speed. The loop could also be written as a list comprehension.

There was talk of vectorizing, but given the complex nature of the calculation I doubt if that is feasible - at least not without digging into the details of those stats functions. In numpy vectorizing means writing a function such that it works with all rows of the array at once, performing the actions in compiled numpy code.

Upvotes: 1

Related Questions