Reputation: 845
I and my coworkers enter data in turns. One day I do, the next week someone else does and we always enter 50 observations at a time (into an Excel sheet). So I can be pretty sure that I entered the cases from 101 to 150, and 301 to 350. We then read the data into R to work with it. How can I select only the cases I entered?
Now I know that I can do that by copying from the excel sheet, however, I wonder if it is doable in R?
I checked several documents about subsetting data with R, also tried things like
data<-data[101:150 & 301:350,]
but didn't work. I appreciate if someone would guide me to a more comprehensive guide answering this question.
Upvotes: 1
Views: 1435
Reputation: 226162
The answer to the specific example you gave is
data[c(100:150,300:350),]
Can you be more specific about which cases you want? Is it the first 50 of each 100, or the first 50 of each 300, or ... ? To get the indices for the first n
of each m
cases you could use something like
c(outer(0:4,seq(1,100,by=10),"+"))
(here n
=5, m
=10); outer
is a generalized outer product. An alternate (and possibly more intuitive) solution would use rep
, e.g.
rep(0:4,10) + rep(seq(1,100,by=10),each=5)
Because R automatically recycles vectors where necessary you could actually shorten this to:
0:4 + rep(seq(1,100,by=10),each=5)
but I would recommend the slightly longer formulation as more understandable.
Upvotes: 4