Reputation: 1
I have a small problem. I have a dataset with 8208 rows of data. It's a single column of data, I want to take every n rows as a block and add this to a new data frame.
So, for example:
newdf has column 1 to column 23.
column 1 is composed of rows 289:528 from the original dataset column 2 is composed of rows 625:864 from the original dataset
And so on. The "block" size is 239 rows, the jump between blocks is every 336 rows.
I can do this manually, but it just becomes tedious. I have to repeat this entire procedure for another 11 sets of data so obviously a more automated approach would be preferable.
Upvotes: 0
Views: 1965
Reputation: 263481
Why not just:
as.dataframe(matrix(orig, nrow=528 )[289:528 ,])
Since the 8028 is not an exactl multiple of the row count we need to determine the columns:
> 8208/528
[1] 15.54545 # so either 15 or 16
> 8208-15*528
[1] 288 # all in the to-be-discarded section
as.dataframe(matrix(orig, nrow=528, col=15 )[289:528 ,])
Or:
as.dataframe(matrix(orig, nrow=528, col=8208 %/% 528)[289:528 ,])
Upvotes: 1
Reputation: 174948
Note the OP states the block size is 239 elements but it is clear from the examples rows indicated that the block size is 240
> length(289:528)
[1] 240
I'll leave the example below at a block length of 239, but adjust if it is really 240.
It isn't clear from the Question, but assuming that you have something like this
df <- data.frame(A = runif(8208))
a data frame with 8208 rows.
First compute the indices of the elements of A
that you need to keep. This is done via
want <- sapply(seq(289, nrow(df)-239, by = 336),
function(x) x + (seq_len(239) - 1))
Then we can use the fact that R fills matrices by columns and convert the required elements of A
to a matrix with 239 rows
mat <- matrix(df$A[want], nrow = 239)
This works
> all.equal(mat[,1], df$A[289:527])
[1] TRUE
but do note that I have taken a block length of 239 here (289:527
) not the indices the OP quotes as that is a block size of 240 (see Update above)
If you want this is a data frame, just add
df2 <- as.data.frame(mat)
Upvotes: 2
Reputation: 179558
The trick here is to create an index of integers that refer to the row numbers you want to keep. This is simple enough with some use of rep
, sequences and R's recycling rule.
Let me demonstrate using iris
. Say you want to skip 25 rows, then return 3 rows:
skip <- 25
take <- 3
total <- nrow(iris)
reps <- total %/% (skip + take)
index <- rep(0:(reps-1), each=take) * (skip + take) + (1:take) + skip
The index now is:
index
[1] 26 27 28 54 55 56 82 83 84 110 111 112 138 139 140
And the rows of iris
:
iris[index, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
Upvotes: 2
Reputation: 17432
Try this:
1) Create a list of indices
lapply(seq(1, 8208, 336), function(X) X:(X+239)) -> Indices
2) Select Data
Columns <- lapply(Indices, function(X) OldDF[X,])
3) Combine selected data in columns
NewDF <- do.call(cbind, Columns)
Upvotes: 1