Reputation: 1
I am trying to obtain a variance for a value I obtained by processing a 2x150 array into a discrete correlation function. In order to do this I need to randomly sample 80% of the original data N times, which will allow me to calculate a variance over these values. have so far been able to create one randomly sampled set of data using this:
rand_indices = []
running_var = (len(find_length)*0.8)
x=0
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
which creates an array 80% of the length of my original with randomly selected indices to be picked out and processed. My problem is that I am not sure how to iterate this in order to get N sets of these random numbers, I think ideally in a Nx120 sized array. My whole code so far is:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from random import randint
useless, just_to, find_length = np.loadtxt("w2_mjy_final.dat").T
w2_dat = np.loadtxt("w2_mjy_final.dat")
w2_rel = np.delete(w2_dat, 2, axis = 1)
w2_array = np.asarray(w2_rel)
w1_dat = np.loadtxt("w1_mjy_final.dat")
w1_rel = np.delete(w1_dat, 2, axis=1)
w1_array = np.asarray(w1_rel)
peaks = []
y=1
N = 0
x = 0
z = 0
rand_indices = []
rand_indices2d = []
running_var = (len(find_length)*0.8)
while z<N:
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
rand_indices2d.append(rand_indices)
z=z+1
while y<N:
w1_sampled = w1_array[rand_indices, :]
w2_sampled = w2_array[rand_indices, :]
w1s_t, w1s_dat = zip(*w1_sampled)
w2s_t, w2s_dat = zip(*w2_sampled)
w2s_mean = np.mean(w2s_dat)
w2s_stdev = np.std(w2s_dat)
w1s_mean = np.mean(w1s_dat)
w1s_stdev = np.std(w1s_dat)
taus = []
dcfs = []
bins = 40
for i in w2s_t:
for j in w1s_t:
tau_datpoint = i-j
taus.append(tau_datpoint)
for k in w2s_dat:
for l in w1s_dat:
dcf_datpoint = ((k - w2s_mean)*(l - w1s_mean))/((w2s_stdev*w1s_stdev))
dcfs.append(dcf_datpoint)
plotdat = np.vstack((taus, dcfs)).T
sort_plotdat = sorted(plotdat, key=lambda x:x[0])
np.savetxt("w1sw2sarray.txt", sort_plotdat)
taus_sort, dcfs_sort = np.loadtxt("w1w2array.txt").T
dcfs_means, taubins_edges, taubins_number = stats.binned_statistic(taus_sort, dcfs_sort, statistic='mean', bins=bins)
taubin_edge = np.delete(taubins_edges, 0)
import operator
indexs, values = max(enumerate(dcfs_means), key=operator.itemgetter(1))
percents = values*0.8
dcf_lists = dcfs_means.tolist()
centarr_negs, centarr_poss = np.split(dcfs_means, [indexs])
centind_negs = np.argmin(np.abs(centarr_negs - percents))
centind_poss = np.argmin(np.abs(centarr_poss - percents))
lagcent_negs = taubins_edges[centind_negs]
lagcent_poss = taubins_edges[int((bins/2)+centind_poss)]
sampled_peak = (np.abs(lagcent_poss - lagcent_negs)/2)+lagcent_negs
peaks.append(sampled_peak)
y=y+1
print peaks
Upvotes: 0
Views: 60
Reputation: 14486
Seeing as you're using numpy already, why not use np.random.randint
In your case:
np.random.randint(len(find_length)-1, size=(N, running_var))
Would give you an N*running_var sized matrix, with random integer entries from 0
to len(find_length)-2
inclusive.
Example Usage:
>>> N=4
>>> running_var=6
>>> find_length = [1,2,3]
>>> np.random.randint(len(find_length)-1, size=(N, running_var))
array([[1, 0, 1, 0, 0, 1],
[1, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[1, 1, 0, 1, 0, 1]])
Upvotes: 1