Reputation: 750
I am trying to aggregate a SparkR dataframe to get two summary variables, the code I am trying to use is:
temp1_aggregate<- temp1 %>%
groupBy("Week", "Store", "Brand", "Conversion_Factor", "Manufacturer", "Type") %>%
agg(Value=mean("Value"), Volume=mean("Volume"))
I have also tried summarise() instead of agg():
temp1_aggregate<- temp1 %>%
groupBy("Week", "Store", "Brand", "Conversion_Factor", "Manufacturer", "Type") %>%
SparkR::summarize(Value=mean("Value", na.rm=TRUE),Volume=mean("Volume", na.rm=TRUE))
Where Value and Volume are columns of numeric (double) type.
Both of these result in the same error:
Error in agg(x, ...) : agg can only support Column or character
In addition: Warning message:
In mean.default("Value", na.rm = TRUE) :
argument is not numeric or logical: returning NA
I am quite confused by this as Value and Volume are both columns and are both numeric (I checked - though I can't share the data as it is proprietary).
I assume these errors are because the syntax is incorrect in some way (I tried to translate from dplyr to SparkR as I need to get it to work with spark dataframes), but I can't work out how.
Please can anyone advise on how to get this to work?
Upvotes: 1
Views: 188
Reputation: 4151
There is no SparkR
mean implementation for character
- it can only take columns, so as you can deduce from the warning message, mean("Volume")
call is dispatched to base::mean
and returns NA
.
To make it work you have to use explicit columns
agg(Value = mean(column("Value")), Volume = mean(column("Volume")))
You can also replace mean
with avg
SparkR::avg
agg(Value = avg(column("Value")), Volume = avg(column("Volume")))
which doesn't shade any built-in method, and would provide more meaningful error, if you passed plain character
:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘avg’ for signature ‘"character"’
Upvotes: 1