Reputation: 17
I am new to R and have been stuck with a problem for quite a while now ... I have a big dataset(gridded data originally) with more than 1,000,000 observations and have to make a group variable for my elements. My dataset looks like follows:
ID Var1
1 0,5
2 0,6
3 0,2
4 0,15
... ...
1029600 0,43
What I want now is to make groups according to the following scheme:
1 2 3 4 5 6 ... 4320
4321 4322 4322 4322 4322 4322 ... 8640
8641 8642 8643 8644 8645 8646 ... 12960
12961 12962 12963 12964 12965 12966 ... 17280
17281 17282 17283 17284 17285 17286 ... 21600
21601 21602 21603 21604 21605 21606 ... 25920
... ... ... ... ... ... ... ...
1025281 1025282 1025283 1025284 1025285 1025286... 1029600
Where the 36 numbers {1,2,3,4,5,6,4321,4322,4323,4324,4325,4326,8641,8642,...,21060} are the first group . The second group would be {7,8,9,10,11,12,4327,4328,...,21612}. The third group would start with {13,14,15...}. And so on for all observations. I hope i could make it clear what my goal is here. I wanted to visualize it with a picture, but as a new member, this is not possible.
So far i managed to do it with a really ugly loop function, which looks as follows:
for(k in 0:40) {
nk <- 25920 * k
mk <- 720 * k
for (j in 0:719) {
cj <- j * 6
for (i in 0:5) {
ai <- i * 4320 + 1 + cj + nk
bi <- i * 4320 + 6 + cj + nk
group[ai:bi] <- 1 + j + mk
}
}
}
I am aware that this is pretty inefficient and it takes a very long time to compute this with loops. I am pretty sure that there is an easier way to solve my problem, but as I am new to R, I cannot find it myself.
Any help would be really appreciated. Thank you in advance!
Upvotes: 1
Views: 122
Reputation: 11429
Create sample data
dtf <- data.frame(ID = 1:1e4, Var1 = rnorm(1:1e4))
Grouping as explained by @antine-sac:
group <- (((dtf$ID-1) %% 4320) %/% 6) +1
Split the data
dtfsplit <- split(dtf, group)
First group
> dtfsplit[1]
$`1`
ID Var1
1 1 0.56655
2 2 0.87645
3 3 -1.41986
4 4 -1.84881
5 5 0.03233
6 6 3.06512
4321 4321 -1.57179
4322 4322 -1.09958
4323 4323 0.55980
4324 4324 0.32390
4325 4325 0.85438
4326 4326 -0.10311
8641 8641 2.08886
8642 8642 1.19836
8643 8643 0.52592
8644 8644 0.20571
8645 8645 1.08429
8646 8646 0.69648
Second group
dtfsplit[2]
Upvotes: 0
Reputation: 6921
You can get the group from the ID with a simple formula:
group <- (((ID-1) %% 4320) %/% 6) +1
Note that %%
is the modulo operation and %/%
is the integer division. The formula should give you groups numbered from 1. No need to include it in a loop, it is a vectorized operation.
There are plenty of ways to do it (like reshaping 1:1029600 into a matrix with 4320 columns and taking the 6*N:6*(N+1) columns and do a match or something) but this is why you should always stop and think about what, really, you want to do. And realize it comes down to a little arithmetic :)
Upvotes: 3