phaser
phaser

Reputation: 625

deleting first row based on column variable

How do I delete the first row of each new variable? For example, here is some data:

m <- c("a","a","a","a","a","b","b","b","b","b") 
n <- c('x','y','x','y','x','y',"x","y",'x',"y") 
o <- c(1:10)

z <- data.frame(m,n,o)

I want to delete the first entry for a and b in column m. I have a very large data frame so I want to do this based on the change from a to b and so on.

Here is what I want the data frame to look like.

  m n  o
1 a y  2
2 a x  3
3 a y  4
4 a x  5
5 b x  7
6 b y  8
7 b x  9
8 b y 10

Thanks.

Upvotes: 1

Views: 134

Answers (3)

thelatemail
thelatemail

Reputation: 93813

Just use duplicated:

z[duplicated(z$m),]

#   m n  o
#2  a y  2
#3  a x  3
#4  a y  4
#5  a x  5
#7  b x  7
#8  b y  8
#9  b x  9
#10 b y 10

Why this works? Consider:

duplicated("a")
#[1] FALSE
duplicated(c("a","a"))
#[1] FALSE  TRUE

Upvotes: 6

HubertL
HubertL

Reputation: 19544

Using group_by and row_numberfrom package dplyr:

z %>% 
  group_by(m) %>%
  filter(row_number(o)!=1)

Upvotes: 1

Sathish
Sathish

Reputation: 12703

data.table is preferred for large datasets in R. setDT converts z data frame to data table by reference. Group by m and remove the first row.

library('data.table')
setDT(z)[, .SD[-1], by = "m"]

Upvotes: 4

Related Questions