Reputation: 109
What I want to do: arrange data by name and class, sort by year, calculate the mean of the values and keep the valueMessage
.
Sample data:
name <- c("a", "a", "b", "b")
class <- c("c1", "c1", "c3", "c3")
year <- c("2010", "2010", "2008", "2008")
value <- c(100, 33, 100, 90)
valueMessage <-c(NA, "meh", NA, NA)
df <- data.frame(name, class, year, value, valueMessage)
df
dat <- aggregate(df$value, list(year = df$year, name = df$name, class=df$class, valueMessage=df$valueMessage), mean)
dat <- dat[with(dat, order(class, name, year)), ]
Returns this:
year name class valueMessage x
1 2010 a c1 meh 33
But I would like it to keep the NA and return something like this:
year name class valueMessage x
1 2010 a c1 meh, NA 66.5
2 2008 b c3 NA 95
Upvotes: 6
Views: 7162
Reputation: 59435
I like to use sqldf
, because SQL is such a nice, simple and intuitive way to do this (without need of knowing thousands of R functions and their specifics and gotchas):
require(sqldf)
sqldf('
select year, name, class, avg(value),
group_concat(distinct case when valueMessage is NULL
then "NA"
else valueMessage
end) as valueMessages
from df
group by class, name, year
')
On your modified example it will produce this output:
year name class avg(value) valueMessages
1 2010 a c1 66.5 NA,meh
2 2008 b c3 95.0 NA
Upvotes: 4
Reputation: 193517
The thing is that you're trying to do two different aggregations: one for your "value" column, and one for your "valueMessage" column.
If that's the case, you would have to do each separately and merge
them, or look at using a package, like "data.table" to help out.
With "data.table", you can do the following:
library(data.table)
DT <- data.table(df)
DT[, list(value = mean(value),
valueMessage = list(unique(valueMessage))),
by = list(year, name, class)]
# year name class value valueMessage
# 1: 2010 a c1 66.5 NA,meh
# 2: 2008 b c3 95.0 NA
Note that the "valueMessage" column is a list
. You could use paste
if you wanted it to be a character vector instead. In other words:
DT[, list(value = mean(value),
valueMessage = paste(unique(valueMessage), collapse = ",")),
by = list(year, name, class)]
Here, I've assumed you're only interested in unique "valueMessage" values.
If you wanted to go with base R's aggregate
, you could try something like the following:
M1 <- aggregate(value ~ year + name + class, df, mean)
M2 <- aggregate(as.character(valueMessage) ~ year + name + class,
df, unique, na.action = na.pass)
merge(M1, M2)
I've used the "formula" method because I like the output better. as.character(valueMessage)
was required because that column is presently a factor
. The output of the valueMessage
column would be, again, a list
, but you can use an anonymous function and paste
if you prefer a character vector of length
1 instead.
Upvotes: 4
Reputation: 56004
A quick workaround, try this:
#convert NAs to "U"
x <- ifelse(is.na(df$valueMessage),"U",df$valueMessage)
df$valueMessage <- x
#aggregate
dat <- aggregate(df$value, list(year = df$year, name = df$name, class=df$class, valueMessage=df$valueMessage), mean)
dat <- dat[with(dat, order(class, name, year)), ]
#convert "U" back to NA
dat$valueMessage <- ifelse(dat$valueMessage=="U",NA,dat$valueMessage)
There must be more elegant way of doing this.
Upvotes: 0