SkyWalker
SkyWalker

Reputation: 14309

R aggregation with duplicates?

I have a data frame of the following form:

Year  Value
===========
1999  A
2000  A
2001  B
2002  B
2003  A
2004  A

I need to get the following output:

"Min Year"  "Max Year" Value
============================
1999        2000       A 
2001        2002       B
2003        2004       A

Thus, I have grouping duplicates and using aggregate out-of-the-box won't work:

> df
  Year Value
1 1999     A
2 2000     A
3 2001     B
4 2002     B
5 2003     A
6 2004     A
> aggregate(Year ~ Value, df, min)
  Value Year
1     A 1999
2     B 2001
> aggregate(Year ~ Value, df, max)
  Value Year
1     A 2004
2     B 2002

What function would help accomplish this without using loops?

Upvotes: 0

Views: 75

Answers (2)

David Arenburg
David Arenburg

Reputation: 92282

Or using rle (only once!)

reps <- rle(as.character(df$Value))
temp <- data.frame(t(sapply(c(1, cumsum(reps$lengths)), function(x) range(df$Year[(x - 1) : x])))[-1, ], reps$values)
colnames(temp) <- c("Min Year", "Max Year", "Value")

#   Min Year Max Year Value
# 1     1999     2000     A
# 2     2001     2002     B
# 3     2003     2004     A

Upvotes: 2

akrun
akrun

Reputation: 886938

Create an index

indx <- setNames(cumsum(c(T,df$Value[-1]!=df$Value[-length(df$Value)])),df$Value)

?aggregate on that index

res <- aggregate(Year~indx, df,FUN=function(x) c(Min=min(x), Max=max(x)))
res$indx <- names(res$indx)
 res[,2:3] <- as.data.frame(res[,2])

change the colnames accordingly

 colnames(res) <- c("Value", "Min Year", "Max Year")
 res
 #  Value Min Year Max Year
 # 1     A     1999     2000
 # 2     B     2001     2002
 # 3     A     2003     2004

Upvotes: 2

Related Questions