Reputation: 51
I am writing a program to simulate the actual polling data companies like Gallup or Rasmussen publish daily: www.gallup.com and www.rassmussenreports.com
I'm using a brute force method, where the computer generates some random daily polling data and then calculates three day averages to see if the average of the random data matches pollsters numbers. (Most companies poll numbers are three day averages)
Currently, it works well for one iteration, but my goal is to have it produce the most common simulation that matches the average polling data. I could then change the code of anywhere from 1 to 1000 iterations.
And this is my problem. At the end of the test I have an array in a single variable that looks something like this:
[40.1, 39.4, 56.7, 60.0, 20.0 ..... 19.0]
The program currently produces one array for each correct simulation. I can store each array in a single variable, but I then have to have a program that could generate 1 to 1000 variables depending on how many iterations I requested!?
How do I avoid this? I know there is an intelligent way of doing this that doesn't require the program to generate variables to store arrays depending on how many simulations I want.
Code testing for McCain:
test = []
while x < 5:
test = round(100*random.random())
mctest.append(test)
x = x +1
mctestavg = (mctest[0] + mctest[1] + mctest[2])/3
#mcavg is real data
if mctestavg == mcavg[2]:
mcwork = mctest
How do I repeat without creating multiple mcwork vars?
Upvotes: 3
Views: 18527
Reputation: 75785
Would something like this work?
from random import randint
mcworks = []
for n in xrange(NUM_ITERATIONS):
mctest = [randint(0, 100) for i in xrange(5)]
if sum(mctest[:3])/3 == mcavg[2]:
mcworks.append(mctest) # mcavg is real data
In the end, you are left with a list of valid mctest
lists.
What I changed:
random.randint
to get random integerssum
to calculate the average of the first three itemsmcworks
, instead of creating a new variable for every iterationUpvotes: 3
Reputation: 469
I would strongly consider using NumPy to do this. You get efficient N-dimensional arrays that you can quickly and easily process.
Upvotes: 1
Reputation: 436
A neat way to do it is to use a list of lists in combination with Pandas. Then you are able to create a 3-day rolling average. This makes it easy to search through the results by just adding the real ones as another column, and using the loc function for finding which ones that match.
rand_vals = [randint(0, 100) for i in range(5))]
df = pd.DataFrame(data=rand_vals, columns=['generated data'])
df['3 day avg'] = df['generated data'].rolling(3).mean()
df['mcavg'] = mcavg # the list of real data
# Extract the resulting list of values
res = df.loc[df['3 day avg'] == df['mcavg']]['3 day avg'].values
This is also neat if you intend to use the same random values for different polls/persons, just add another column with their real values and perform the same search for them.
Upvotes: 0
Reputation: 70324
since you are thinking in variables, you might prefer a dictionary over a list of lists:
data = {}
data['a'] = [generate_poll_data()]
data['b'] = [generate_poll_data()]
etc.
Upvotes: 1
Reputation: 75785
Lists in python can contain any type of object -- If I understand the question correctly, will a list
of list
s do the job? Something like this (assuming you have a function generate_poll_data()
which creates your data:
data = []
for in xrange(num_iterations):
data.append(generate_poll_data())
Then, data[n]
will be the list of data from the (n-1)
th run.
Upvotes: 1
Reputation: 44150
Are you talking about doing this?
>>> a = [ ['a', 'b'], ['c', 'd'] ]
>>> a[1]
['c', 'd']
>>> a[1][1]
'd'
Upvotes: 3