Reputation: 10123
I have a data frame like so:
df <- structure(list(year = c(1990, 1990, 1990, 1990, 1990, 1990, 1990,
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1991, 1991, 1991,
1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991,
1991), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
value = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L)), .Names = c("year", "group", "value"
), row.names = c(NA, -30L), class = "data.frame")
> df
year group value
1 1990 A 1
2 1990 A 2
3 1990 A 3
4 1990 A 4
5 1990 A 5
6 1990 A 6
7 1990 B 7
8 1990 B 8
9 1990 B 9
10 1990 B 10
11 1990 B 11
12 1990 B 12
13 1990 B 13
14 1990 B 14
15 1990 B 15
16 1991 A 5
17 1991 A 6
18 1991 A 7
19 1991 A 8
20 1991 A 9
21 1991 A 10
22 1991 A 11
23 1991 A 12
24 1991 A 13
25 1991 A 14
26 1991 B 15
27 1991 B 16
28 1991 B 17
29 1991 B 18
30 1991 B 19
I need to apply a function for each year (I intend to do that with plyr
and summarise
) but only on the factor level with the most rows (A or B). Is there a way to automatically select this level (A or B) for each year?
df2 <- ddply(df, .(year), summarise, result="some operation on longest level"))
desired output:
> df2
year group value result
1 1990 B 7 5
2 1990 B 8 4
3 1990 B 9 5
4 1990 B 10 3
5 1990 B 11 3
6 1990 B 12 8
7 1990 B 13 11
8 1990 B 14 7
9 1990 B 15 2
10 1991 A 5 10
11 1991 A 6 13
12 1991 A 7 9
13 1991 A 8 7
14 1991 A 9 6
15 1991 A 10 1
16 1991 A 11 15
17 1991 A 12 5
18 1991 A 13 5
19 1991 A 14 2
Upvotes: 3
Views: 411
Reputation: 70256
this might be another approach with dplyr
library(dplyr)
df <- df %.% group_by(year,group) %.% mutate(count = n()) %.% ungroup()
df <- df %.% group_by(year) %.% filter(count %in% max(count)) %.% mutate(result = sqrt(value))
df$count <- NULL
since i am not sure what function you want to apply to result
I used sqrt(value)
as in @rbatt's answer
Upvotes: 3
Reputation: 206197
Sorry, I don't use plyr myself, but here's how i might do it with base functions. Perhaps that will inspire a plyr solution for you.
#find largest groups for each year
maxgroups <- tapply(df$group, df$year, function(x) which.max(table(x)))
#create group names
maxpairs <- paste(names(maxgroups),levels(df$group)[maxgroups], sep=".")
#helper function
ifnotin<-function(val,set,ifnotin) {out<-val; out[!val%in%set]<-ifnotin; droplevels(out)}
#new factor indicating best group
tgroups <- ifnotin(interaction(df$year, df$group), maxpairs, NA)
#now transform the best groups by adding year to result (or whatever transformation you need to do)
transform(df, value=ifelse(!is.na(tgroups), value+year, value))
I wasn't sure if your transformation need to know what group/year it was for or not. If you just needed to know if it was in a group that needed transformation you could skip the tgroups
and just use
needstransform <- interaction(df$year, df$group) %in% maxpairs
but tgroups
has NA values that would be good for summaries tapply(df$value, droplevels(tgroups), mean)
and such
Upvotes: 1
Reputation: 4807
This is what I came up with:
df2 <- ddply(
df,
.(year),
summarise,
result=sqrt(
value[group==names(which.max(table(df$group)))]
)
)
Upvotes: 0
Reputation: 44525
I don't think this is a very good answer because it's super obfuscated (and it doesn't use your desired plyr approach), but maybe it will stimulate someone else's thinking:
Basically, you just need to know which values of group
you want to look at for each year. Let's say you figure that out and store those values (in the same order as splits of the original data by year
) in a variable called m
, then you can mapply
some function that subsets each split (of the data by year) by group
and then does whatever other calculations you want.
do.call(rbind, mapply(function(x,y) {
tmp <- x[x$group==y,]
#fun(tmp) # apply your function to the relevant subset
}, split(df,df$year), m, SIMPLIFY=FALSE))
I thought of three different ways you could generate m
. Here they are:
m <- with(df, levels(group)[apply(table(group, year), 2, which.max)])
m <- levels(df$group)[sapply(split(df, df$year), function(x) which.max(sapply(split(x, x$group), nrow)))]
m <- with(df, levels(group)[apply(tapply(year, list(group, year), length),2,which.max)])
Upvotes: 0