Reputation: 1
I have a data frame with 77,760 rows and I want to extract only rows that have row number difference of 13. So I want rows like 1st, 14th, 27th, 40th, 53th, 66th, 79th, 92th, 105th, 118th, 131th, 144th. But after each multiple of 144 I want take next row (145th, 289th..) and again extract same seq of difference of 13 rows. So after 144th row I don't want next row 157th but 145th and then it continues 1st... 144th, 145th, 158th... till it reaches next multiple of 144 (i.e. 288th row) and then again 1... 144th, 145th, 158th, 171th... 288th, 289th... 302th... ...77,760th row.
So far, as a solution to my last post I tried using following to extract all rows with difference of 13th.
my_frame[seq(from = 1, to = nrow(dataframe), by = 13), ]
But, now I want to theoretically reset row seq after every 144th, 288th, 432th row and extract rows as mentioned
Actual results I am getting: 1st, 14th... 144th, 157th, 170th... ...77,760th rows
Expected results: 1st, 14th... 144th, 145th, 158th... 288th, 289th... ...432th, 433th... ...77,760th
Can anyone help me with logic?
Upvotes: 0
Views: 286
Reputation: 782
Another option would be to use a while
loop to generate the row numbers and then proceed to extract data from these rows. An 'index' variable is used to jump from a row number to other at every iteration of the while
loop. If this 'index' has a value which is a multiple of 144, then 'index' will be incremented by 1 else by 13. Every value that was ever stored by 'index' will become a part of our 'imp_row' vector.
index = 1
final_row = nrow(data_frame_name)
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
imp_row = c() #this will hold all the important row numbers
while(index<final_row){ #perform number generation until we reach the final row number
imp_row = append(imp_row, index)
if((index%%144) == 0){
index = index + 1}else{
index = index + 13
}
}
head(imp_row,20)
#now you can index your dataframe via the imp_row vector as : data_frame_name[imp_row,]
Alternatively, you can also skip the recording of 'index' values in the 'imp_row' and directly use the 'index' value as row numbers in the data frame.
index = 1
final_row = nrow(data_frame_name)
#Obtain the no. of rows; this will be used to limit the number generation process of while loop
while(index<final_row){ #perform number generation until we reach the final row number
#you can directly use data_frame_name[index, ] and perform your operation of
#interest at those specific row numbers, and then
#increment 'index' as per your requirements
if((index%%144) == 0){
index = index + 1}else{
index = index + 13
}
}
Upvotes: 0
Reputation: 11140
You can generate the row numbers first and use it to subset your dataframe -
row_numbers <- c(sapply(seq(1, 77760, 144), function(x) seq(x, by = 13, length.out = 12)))
head(row_numbers, 50)
[1] 1 14 27 40 53 66 79 92 105 118 131 144 145 158 171 184 197 210 223 236
[21] 249 262 275 288 289 302 315 328 341 354 367 380 393 406 419 432 433 446 459 472
[41] 485 498 511 524 537 550 563 576 577 590
result <- your_df[row_numbers, ]
Upvotes: 2
Reputation: 887148
An option would be to split
the data.frame
my_frame1 <- do.call(rbind, lapply(unname(split(my_frame,
(seq_len(nrow(my_frame)) - 1) %/% 144 + 1)),
function(dat) dat[seq(1, nrow(dat), by = 13),]))
row.names(my_frame1)
#[1] "1" "14" "27" "40" "53" "66" "79" "92" "105" "118" "131"
#[12] "144" "145" "158" "171" "184" "197" "210" "223" "236" "249" ...
It may be also better to split
by the sequence of rows
s1 <- seq_len(nrow(my_frame))
i1 <- unlist(lapply(unname(split(s1, (s1-1) %/% 144 + 1)),
`[`, rep(c(TRUE, FALSE), c(1, 12))))
my_frame1 <- my_frame[i1,]
set.seed(24)
my_frame <- data.frame(col1 = sample(1:9, 1000, replace = TRUE), col2 = rnorm(1000))
Upvotes: 1