barerd
barerd

Reputation: 845

Selecting first n observations of each group

I and my coworkers enter data in turns. One day I do, the next week someone else does and we always enter 50 observations at a time (into an Excel sheet). So I can be pretty sure that I entered the cases from 101 to 150, and 301 to 350. We then read the data into R to work with it. How can I select only the cases I entered?

Now I know that I can do that by copying from the excel sheet, however, I wonder if it is doable in R?

I checked several documents about subsetting data with R, also tried things like

data<-data[101:150 & 301:350,]

but didn't work. I appreciate if someone would guide me to a more comprehensive guide answering this question.

Upvotes: 1

Views: 1435

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226162

The answer to the specific example you gave is

data[c(100:150,300:350),] 

Can you be more specific about which cases you want? Is it the first 50 of each 100, or the first 50 of each 300, or ... ? To get the indices for the first n of each m cases you could use something like

c(outer(0:4,seq(1,100,by=10),"+"))

(here n=5, m=10); outer is a generalized outer product. An alternate (and possibly more intuitive) solution would use rep, e.g.

rep(0:4,10) + rep(seq(1,100,by=10),each=5)

Because R automatically recycles vectors where necessary you could actually shorten this to:

0:4 + rep(seq(1,100,by=10),each=5)

but I would recommend the slightly longer formulation as more understandable.

Upvotes: 4

Related Questions