Reputation: 125
I have three arrays as listed below:
users
— Contains the id of 50000 users ( all distinct )pusers
— Contains the id of users who own some posts (contains repeated id's also, that is, one user can own many posts) [ 50000 values]score
— Contains the score corresponding to each value in pusers.[ 50000 values]Now I want to populate another array PScore
based on the following calculation. For each value of users
in pusers
, I need to fetch the corresponding score
and add it to the PScore
array in the index corresponding to the user
.
Example,
if users[5] = 23224
and pusers[6] = pusers[97] = 23224
then PScore[5] += score[6]+score[97]
Items of note:
score
is related to pusers
(e.g., pusers[5]
has score[5]
)PScore
is expected to be related to users
(e.g., cumulative score of users[5]
is Pscore[5]
)score
of 0.Can anyone help me in doing this? I tried a lot but once I run my different trials, the output screen remains blank until I Ctrl+Z
and get out.
I went through all of the following posts but I couldn't use them effectively for my scenario.
I am new to this forum and I'm a beginner in Python too. Any help is going to be really useful to me.
Ok I understand that something is wrong with my approach. So shouldn't I use lists for this scenario? Can anyone please tell me how I should proceed with this?
Sample of the data that i have arrived at is as shown below.
PUsers Score
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
-1 0
13 0
77 1
77 4
77 3
77 0
77 2
77 2
77 3
102 2
105 0
108 2
108 2
117 2
Users
-1
1
2
3
4
5
8
9
10
11
13
16
17
19
20
22
23
24
25
26
27
29
30
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
48
49
50
All that I want is the total score associated with each user. Once again, the pusers list contains repetition while users list contains unique values. I need the total score associated with each user stored in such a way that, if I say PScore[6]
, it should refer to the total score associated with User[6]
.
Hope I answered the queries.
Thanks in advance.
Upvotes: 4
Views: 587
Reputation: 11968
I think your algorithm is either wrong or broken.
Try to compute it's complexity. If it's N^2
or more you are likely using an inefficient algorithm. O(N^2)
with 50.000 elements should take a few seconds. O(N^3)
will probably take minutes.
If you're sure of your approach try running it with some small fake data to figure out if it does the right thing or if you accidentally added some infinite loop.
You can easily get it working in linear time with dictionaries.
Upvotes: 1
Reputation: 482
From how you described your arrays and since you're using python, this looks like a perfect candidate for dictionaries.
Instead of having one array for post owner and another array for post score, you should be able to make a dictionary that maps a user id to a score. When you're taking in data, look in the dictionary to see if the user already exists. If so, add the score to the current score. If not, make a new entry. When you've looped through all the data, you should have a dictionary that maps from user id to total score.
http://docs.python.org/2/tutorial/datastructures.html#dictionaries
Upvotes: 2