Reputation:
I have a data set of length (L) which I named "data".
data=raw_data.iloc[:,0]
I randomly generated 2000 sample series from "data" and named it "resamples" to have a NumPy matrix of len =2000 and cols=L of the "data".
resamples=[np.random.choice(data, size=len(data), replace=True) for i in range (2000)]
The code below shows two operations in Scipy.stats using "data" which is a single array. Now I need to perform the same operation on each one of those sample series (2000 rows) by defining a for loop. The challenge is two parameters (loc and scale) are calculated in the first step and they should be used for each row to perform the next one. My knowledge falls short in defining such a for loop. I was wondering if anyone could help me with this.
loc, scale=stats.gumbel_r.fit(data)
return_gumbel=stats.gumbel_r.ppf([0.9999,0.9995,0.999],loc=loc, scale=scale)
Upvotes: 0
Views: 204
Reputation: 231375
The description is a little unclear, but I think you just need:
alist = []
for data in resamples:
loc, scale=stats.gumbel_r.fit(data)
return_gumbel=stats.gumbel_r.ppf([0.9999,0.9995,0.999],loc=loc, scale=scale)
alist.append(return_gumbel)
arr = np.array(alist)
You could also create arr
first, and assign return_gumbel
to the respective rows, but the list append is about the same speed. The loop could also be written as a list comprehension.
There was talk of vectorizing
, but given the complex nature of the calculation I doubt if that is feasible - at least not without digging into the details of those stats
functions. In numpy
vectorizing
means writing a function such that it works with all rows of the array at once, performing the actions in compiled numpy code.
Upvotes: 1