SpanishTrain
SpanishTrain

Reputation: 17

Avoiding loop by grouping variable in R

I am new to R and have been stuck with a problem for quite a while now ... I have a big dataset(gridded data originally) with more than 1,000,000 observations and have to make a group variable for my elements. My dataset looks like follows:

ID        Var1
1         0,5 
2         0,6 
3         0,2 
4         0,15
...       ... 
1029600   0,43

What I want now is to make groups according to the following scheme:

1       2       3       4       5       6      ...   4320
4321    4322    4322    4322    4322    4322   ...   8640
8641    8642    8643    8644    8645    8646   ...   12960
12961    12962  12963   12964   12965   12966  ...   17280
17281   17282   17283   17284   17285   17286  ...   21600
21601   21602   21603   21604   21605   21606  ...   25920
...      ...     ...    ...     ...     ...    ...    ...
1025281 1025282 1025283 1025284 1025285 1025286...   1029600

Where the 36 numbers {1,2,3,4,5,6,4321,4322,4323,4324,4325,4326,8641,8642,...,21060} are the first group . The second group would be {7,8,9,10,11,12,4327,4328,...,21612}. The third group would start with {13,14,15...}. And so on for all observations. I hope i could make it clear what my goal is here. I wanted to visualize it with a picture, but as a new member, this is not possible.

So far i managed to do it with a really ugly loop function, which looks as follows:

for(k in 0:40) { 
    nk <- 25920 * k
    mk <- 720 * k
    for (j in 0:719) {
        cj <- j * 6
        for (i in 0:5) { 
            ai <- i * 4320 + 1 + cj + nk
            bi <- i * 4320 + 6 + cj + nk
            group[ai:bi] <- 1 + j + mk
        }
    }
} 

I am aware that this is pretty inefficient and it takes a very long time to compute this with loops. I am pretty sure that there is an easier way to solve my problem, but as I am new to R, I cannot find it myself.

Any help would be really appreciated. Thank you in advance!

Upvotes: 1

Views: 122

Answers (2)

Paul Rougieux
Paul Rougieux

Reputation: 11429

Create sample data

dtf <- data.frame(ID = 1:1e4, Var1 = rnorm(1:1e4))

Grouping as explained by @antine-sac:

group <- (((dtf$ID-1) %% 4320) %/% 6) +1

Split the data

dtfsplit <- split(dtf, group)

First group

> dtfsplit[1]
$`1`
       ID     Var1
1       1  0.56655
2       2  0.87645
3       3 -1.41986
4       4 -1.84881
5       5  0.03233
6       6  3.06512
4321 4321 -1.57179
4322 4322 -1.09958
4323 4323  0.55980
4324 4324  0.32390
4325 4325  0.85438
4326 4326 -0.10311
8641 8641  2.08886
8642 8642  1.19836
8643 8643  0.52592
8644 8644  0.20571
8645 8645  1.08429
8646 8646  0.69648

Second group

dtfsplit[2]

Upvotes: 0

asachet
asachet

Reputation: 6921

You can get the group from the ID with a simple formula:

group <- (((ID-1) %% 4320) %/% 6) +1

Note that %% is the modulo operation and %/% is the integer division. The formula should give you groups numbered from 1. No need to include it in a loop, it is a vectorized operation.

There are plenty of ways to do it (like reshaping 1:1029600 into a matrix with 4320 columns and taking the 6*N:6*(N+1) columns and do a match or something) but this is why you should always stop and think about what, really, you want to do. And realize it comes down to a little arithmetic :)

Upvotes: 3

Related Questions