Anu145
Anu145

Reputation: 125

working with arrays of large size in python

I have three arrays as listed below:

  1. users — Contains the id of 50000 users ( all distinct )
  2. pusers — Contains the id of users who own some posts (contains repeated id's also, that is, one user can own many posts) [ 50000 values]
  3. score — Contains the score corresponding to each value in pusers.[ 50000 values]

Now I want to populate another array PScore based on the following calculation. For each value of users in pusers, I need to fetch the corresponding score and add it to the PScore array in the index corresponding to the user.

Example,

if users[5] = 23224
and pusers[6] = pusers[97] = 23224 
then PScore[5] += score[6]+score[97]

Items of note:

Can anyone help me in doing this? I tried a lot but once I run my different trials, the output screen remains blank until I Ctrl+Z and get out.

I went through all of the following posts but I couldn't use them effectively for my scenario.

I am new to this forum and I'm a beginner in Python too. Any help is going to be really useful to me.

Additional Information

Ok I understand that something is wrong with my approach. So shouldn't I use lists for this scenario? Can anyone please tell me how I should proceed with this?

Sample of the data that i have arrived at is as shown below.

PUsers  Score
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
13  0
77  1
77  4
77  3
77  0
77  2
77  2
77  3
102     2
105     0
108     2
108     2
117     2

Users
-1
1
2
3
4
5
8
9
10
11
13
16
17
19
20
22
23
24
25
26
27
29
30
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
48
49
50

All that I want is the total score associated with each user. Once again, the pusers list contains repetition while users list contains unique values. I need the total score associated with each user stored in such a way that, if I say PScore[6], it should refer to the total score associated with User[6].

Hope I answered the queries.

Thanks in advance.

Upvotes: 4

Views: 587

Answers (2)

Sorin
Sorin

Reputation: 11968

I think your algorithm is either wrong or broken. Try to compute it's complexity. If it's N^2 or more you are likely using an inefficient algorithm. O(N^2) with 50.000 elements should take a few seconds. O(N^3) will probably take minutes. If you're sure of your approach try running it with some small fake data to figure out if it does the right thing or if you accidentally added some infinite loop.

You can easily get it working in linear time with dictionaries.

Upvotes: 1

Dillon Welch
Dillon Welch

Reputation: 482

From how you described your arrays and since you're using python, this looks like a perfect candidate for dictionaries.

Instead of having one array for post owner and another array for post score, you should be able to make a dictionary that maps a user id to a score. When you're taking in data, look in the dictionary to see if the user already exists. If so, add the score to the current score. If not, make a new entry. When you've looped through all the data, you should have a dictionary that maps from user id to total score.

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Upvotes: 2

Related Questions