Reputation: 137

compare the first and last observation in each group

I have a dataset like this:

df <- data.frame(group = c(rep(1,3),rep(2,2), rep(3,3),rep(4,3),rep(5, 2)), score = c(30, 10, 22, 44, 50, 5, 20, 1,35, 2, 60, 14,5))

   group score
1      1    30
2      1    10
3      1    22
4      2    44
5      2    50
6      3     5
7      3    20
8      3     1
9      4    35
10     4     2
11     4    60
12     5    14
13     5     5

I wish to compare the first score and last score in each group, if the last score is smaller than the first score, then output the group number. The expected output should be like:

group 1 3 5

does anyone have idea how to realized this?

Upvotes: 0

Answers (5)

linp

Reputation: 1517

I'm plyr package fun..

library(plyr)
df1<-ddply(df,.(group),summarise,shown=score[length(group)]<score[1])
subset(df1,shown)

group shown
1     TRUE
3     TRUE
5     TRUE

Upvotes: 0

talat

Reputation: 70256

One more base R option:

with(df, unique(df$group[as.logical(ave(score, group, FUN = function(x) head(x,1) > tail(x, 1)))]))
#[1] 1 3 5

Or using dplyr:

library(dplyr)
group_by(df, group) %>% filter(first(score) > last(score)) %>% do(head(.,1)) %>% 
 select(group)

#  group
#1     1
#2     3
#3     5

Upvotes: 1

David Arenburg

Reputation: 92282

Here's data.table approach

library(data.table)
setDT(df)[, score[1] > score[.N], by = group][V1 == TRUE]

##    group   V1
## 1:     1 TRUE
## 2:     3 TRUE
## 3:     5 TRUE

setDT(df)[, group[score[1] > score[.N]], by = group]

##    group V1
## 1:     1  1
## 2:     3  3
## 3:     5  5

setDT(df)[, .BY[score[1] > score[.N]], by = group]

As per @beginneR's comment, if you don't like V1 you could do

df2 <- as.data.table(df)[, .BY[score[1] > score[.N]], by = group][, V1 := NULL]
df2

##    group
## 1:     1
## 2:     3
## 3:     5

Upvotes: 3

Jota

Reputation: 17611

This answer assumes that every group has at least 2 observations:

newdf <- merge(rbind(df[diff(df$group) == 1 ,] , df[dim(df)[1], ]), 
           df[!duplicated(df$group), ],
           by="group")

newdf[which(newdf$score.x < newdf$score.y), 'group']
#[1] 1 3 5

df[diff(df$group) == 1 ,] identifies the last observation of each group, except for the last group, which is why I rbind the last entry (i.e. df[dim(df)[1], ]). Then, the first observation of each group is given by df[!duplicated(df$group), ]. We merge these on the group column, then identify which ones meet the criteria.

Another option for the merge step:

merge(df[which(!duplicated(df$group))+(rle(df$group)$lengths-1),],
  df[!duplicated(df$group), ],
  by="group")

Upvotes: 1

nico

Reputation: 51640

This should do the job:

# First split the data frame by group
# This returns a list
df.split <- split(df, factor(df$group))

# Now use sapply on the list to check first and last of each group
# We return the group or NA using ifelse
res <- sapply(df.split, 
       function(x){ifelse(x$score[1] > x$score[nrow(x)], x$group[1], NA)})

# Finally, filter away the NAs
res <- res[!is.na(res)]

Upvotes: 2

compare the first and last observation in each group

Answers (5)

Related Questions