Reputation: 23
so i have 2 datasets, the first one is a dataframe
df1 <- data.frame(user=c(1:10), h01=c(3,3,6,8,9,10,4,1,2,5), h12=c(5,5,3,4,1,2,8,8,9,10),a=numeric(10))
the first column represents the user
id, and h01
represents the id of a cell phone antenna from which the user
is connected for a period of time (00:00 - 1:00AM) and h12
represents the same but between 1:00AM and 2:00AM.
And then i have an array
array1 <- array(c(23,12,63,11,5,6,9,41,23,73,26,83,41,51,29,10,1,5,30,2), dim=c(10,2))
The rows represent the cell phone antenna id, the columns represent the periods of time and the values in array1
represent how many people is connected to the antenna at that period of time. So array1[1,1]
will print how many people is connected between 00:00 and 1:00 to antenna 1, array1[2,2]
will print how many people is connected between 1:00 and 2:00 to antenna 2 and so on.
What i want to do is for each user
in df1
obtain from array1
how many people in total is connected to the same antennas in the same period of time and place the value in column a
.
For example, the first user
is connected to antenna 3 between 00:00 and 1:00AM, and antenna 5 between 1:00AM and 2:00AM, so the value in a
should be array1[3,1]
plus array1[5,2]
I used a for loop to do this
aux1 <- df1[,2]
aux2 <- df1[,3]
for(i in 1:length(df1$user)){
df1[i,4] <- sum(array1[aux1[i],1],array1[aux2[i],2])
}
which gives
user h01 h02 a
1 1 3 5 92
2 2 3 5 92
3 3 6 3 47
4 4 8 4 92
5 5 9 1 49
6 6 10 2 156
7 7 4 8 16
8 8 1 8 28
9 9 2 9 42
10 10 5 10 7
This loop works and gives the correct values, the problem is the 2 datasets (df1
and array1
) are really big. df1
has over 20.000 users and 24 periods of time, and array1
has over 1300 antennas, not to mention that this data corresponds to users from one socioeconomic level, and i have 5 in total, so simplifying the code is mandatory.
I would love if someone could show me a different approach to this, specially if its withouth a for loop.
Upvotes: 0
Views: 75
Reputation: 156
Try this approach:
df1$a <- array1[df1$h01,1] + array1[df1$h12,2]
Upvotes: 2