Reputation: 137
I have a dataset like this:
df <- data.frame(group = c(rep(1,3),rep(2,2), rep(3,3),rep(4,3),rep(5, 2)), score = c(30, 10, 22, 44, 50, 5, 20, 1,35, 2, 60, 14,5))
group score
1 1 30
2 1 10
3 1 22
4 2 44
5 2 50
6 3 5
7 3 20
8 3 1
9 4 35
10 4 2
11 4 60
12 5 14
13 5 5
I wish to compare the first score and last score in each group, if the last score is smaller than the first score, then output the group number. The expected output should be like:
group 1 3 5
does anyone have idea how to realized this?
Upvotes: 0
Views: 372
Reputation: 1517
I'm plyr package fun..
library(plyr)
df1<-ddply(df,.(group),summarise,shown=score[length(group)]<score[1])
subset(df1,shown)
group shown
1 TRUE
3 TRUE
5 TRUE
Upvotes: 0
Reputation: 70256
One more base R option:
with(df, unique(df$group[as.logical(ave(score, group, FUN = function(x) head(x,1) > tail(x, 1)))]))
#[1] 1 3 5
Or using dplyr
:
library(dplyr)
group_by(df, group) %>% filter(first(score) > last(score)) %>% do(head(.,1)) %>%
select(group)
# group
#1 1
#2 3
#3 5
Upvotes: 1
Reputation: 92282
Here's data.table
approach
library(data.table)
setDT(df)[, score[1] > score[.N], by = group][V1 == TRUE]
## group V1
## 1: 1 TRUE
## 2: 3 TRUE
## 3: 5 TRUE
Or
setDT(df)[, group[score[1] > score[.N]], by = group]
## group V1
## 1: 1 1
## 2: 3 3
## 3: 5 5
Or
setDT(df)[, .BY[score[1] > score[.N]], by = group]
As per @beginneR's comment, if you don't like V1
you could do
df2 <- as.data.table(df)[, .BY[score[1] > score[.N]], by = group][, V1 := NULL]
df2
## group
## 1: 1
## 2: 3
## 3: 5
Upvotes: 3
Reputation: 17611
This answer assumes that every group has at least 2 observations:
newdf <- merge(rbind(df[diff(df$group) == 1 ,] , df[dim(df)[1], ]),
df[!duplicated(df$group), ],
by="group")
newdf[which(newdf$score.x < newdf$score.y), 'group']
#[1] 1 3 5
df[diff(df$group) == 1 ,]
identifies the last observation of each group, except for the last group, which is why I rbind
the last entry (i.e. df[dim(df)[1], ]
). Then, the first observation of each group is given by df[!duplicated(df$group), ]
. We merge these on the group column, then identify which ones meet the criteria.
Another option for the merge
step:
merge(df[which(!duplicated(df$group))+(rle(df$group)$lengths-1),],
df[!duplicated(df$group), ],
by="group")
Upvotes: 1
Reputation: 51640
This should do the job:
# First split the data frame by group
# This returns a list
df.split <- split(df, factor(df$group))
# Now use sapply on the list to check first and last of each group
# We return the group or NA using ifelse
res <- sapply(df.split,
function(x){ifelse(x$score[1] > x$score[nrow(x)], x$group[1], NA)})
# Finally, filter away the NAs
res <- res[!is.na(res)]
Upvotes: 2