Michael Hanley
Michael Hanley

Reputation: 1

Extracting block of m rows at regular interval from large dataset

I have a small problem. I have a dataset with 8208 rows of data. It's a single column of data, I want to take every n rows as a block and add this to a new data frame.

So, for example:

newdf has column 1 to column 23.

column 1 is composed of rows 289:528 from the original dataset column 2 is composed of rows 625:864 from the original dataset

And so on. The "block" size is 239 rows, the jump between blocks is every 336 rows.

I can do this manually, but it just becomes tedious. I have to repeat this entire procedure for another 11 sets of data so obviously a more automated approach would be preferable.

Upvotes: 0

Views: 1965

Answers (4)

IRTFM
IRTFM

Reputation: 263481

Why not just:

 as.dataframe(matrix(orig, nrow=528 )[289:528 ,])

Since the 8028 is not an exactl multiple of the row count we need to determine the columns:

> 8208/528
[1] 15.54545 # so either 15 or 16
> 8208-15*528
[1] 288  # all in the to-be-discarded section

as.dataframe(matrix(orig, nrow=528, col=15 )[289:528 ,])

Or:

as.dataframe(matrix(orig, nrow=528, col=8208 %/% 528)[289:528 ,])

Upvotes: 1

Gavin Simpson
Gavin Simpson

Reputation: 174948

Update

Note the OP states the block size is 239 elements but it is clear from the examples rows indicated that the block size is 240

> length(289:528)
[1] 240

I'll leave the example below at a block length of 239, but adjust if it is really 240.


It isn't clear from the Question, but assuming that you have something like this

df <- data.frame(A = runif(8208))

a data frame with 8208 rows.

First compute the indices of the elements of A that you need to keep. This is done via

want <- sapply(seq(289, nrow(df)-239, by = 336),
               function(x) x + (seq_len(239) - 1))

Then we can use the fact that R fills matrices by columns and convert the required elements of A to a matrix with 239 rows

mat <- matrix(df$A[want], nrow = 239)

This works

> all.equal(mat[,1], df$A[289:527])
[1] TRUE

but do note that I have taken a block length of 239 here (289:527) not the indices the OP quotes as that is a block size of 240 (see Update above)

If you want this is a data frame, just add

df2 <- as.data.frame(mat)

Upvotes: 2

Andrie
Andrie

Reputation: 179558

The trick here is to create an index of integers that refer to the row numbers you want to keep. This is simple enough with some use of rep, sequences and R's recycling rule.

Let me demonstrate using iris. Say you want to skip 25 rows, then return 3 rows:

skip <- 25
take <- 3

total <- nrow(iris)
reps <- total %/% (skip + take)
index <- rep(0:(reps-1), each=take) * (skip + take) + (1:take) + skip

The index now is:

index
 [1]  26  27  28  54  55  56  82  83  84 110 111 112 138 139 140

And the rows of iris:

iris[index, ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
26           5.0         3.0          1.6         0.2     setosa
27           5.0         3.4          1.6         0.4     setosa
28           5.2         3.5          1.5         0.2     setosa
54           5.5         2.3          4.0         1.3 versicolor
55           6.5         2.8          4.6         1.5 versicolor
56           5.7         2.8          4.5         1.3 versicolor
82           5.5         2.4          3.7         1.0 versicolor
83           5.8         2.7          3.9         1.2 versicolor
84           6.0         2.7          5.1         1.6 versicolor
110          7.2         3.6          6.1         2.5  virginica
111          6.5         3.2          5.1         2.0  virginica
112          6.4         2.7          5.3         1.9  virginica
138          6.4         3.1          5.5         1.8  virginica
139          6.0         3.0          4.8         1.8  virginica
140          6.9         3.1          5.4         2.1  virginica

Upvotes: 2

Se&#241;or O
Se&#241;or O

Reputation: 17432

Try this:

1) Create a list of indices

lapply(seq(1, 8208, 336), function(X) X:(X+239)) -> Indices

2) Select Data

Columns <- lapply(Indices, function(X) OldDF[X,])

3) Combine selected data in columns

NewDF <- do.call(cbind, Columns)

Upvotes: 1

Related Questions