Avoiding loop by grouping variable in R

Question

I am new to R and have been stuck with a problem for quite a while now ... I have a big dataset(gridded data originally) with more than 1,000,000 observations and have to make a group variable for my elements. My dataset looks like follows:

ID        Var1
1         0,5 
2         0,6 
3         0,2 
4         0,15
...       ... 
1029600   0,43

What I want now is to make groups according to the following scheme:

1       2       3       4       5       6      ...   4320
4321    4322    4322    4322    4322    4322   ...   8640
8641    8642    8643    8644    8645    8646   ...   12960
12961    12962  12963   12964   12965   12966  ...   17280
17281   17282   17283   17284   17285   17286  ...   21600
21601   21602   21603   21604   21605   21606  ...   25920
...      ...     ...    ...     ...     ...    ...    ...
1025281 1025282 1025283 1025284 1025285 1025286...   1029600

Where the 36 numbers {1,2,3,4,5,6,4321,4322,4323,4324,4325,4326,8641,8642,...,21060} are the first group . The second group would be {7,8,9,10,11,12,4327,4328,...,21612}. The third group would start with {13,14,15...}. And so on for all observations. I hope i could make it clear what my goal is here. I wanted to visualize it with a picture, but as a new member, this is not possible.

So far i managed to do it with a really ugly loop function, which looks as follows:

for(k in 0:40) { 
    nk <- 25920 * k
    mk <- 720 * k
    for (j in 0:719) {
        cj <- j * 6
        for (i in 0:5) { 
            ai <- i * 4320 + 1 + cj + nk
            bi <- i * 4320 + 6 + cj + nk
            group[ai:bi] <- 1 + j + mk
        }
    }
}

I am aware that this is pretty inefficient and it takes a very long time to compute this with loops. I am pretty sure that there is an easier way to solve my problem, but as I am new to R, I cannot find it myself.

Any help would be really appreciated. Thank you in advance!

asachet · Accepted Answer

You can get the group from the ID with a simple formula:

group <- (((ID-1) %% 4320) %/% 6) +1

Note that %% is the modulo operation and %/% is the integer division. The formula should give you groups numbered from 1. No need to include it in a loop, it is a vectorized operation.

There are plenty of ways to do it (like reshaping 1:1029600 into a matrix with 4320 columns and taking the 6*N:6*(N+1) columns and do a match or something) but this is why you should always stop and think about what, really, you want to do. And realize it comes down to a little arithmetic :)

Avoiding loop by grouping variable in R

Answers (2)

Related Questions