Reputation: 489
I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way:
require(dplyr)
df<-df%>%
group_by(pagename)%>%
summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews))
But this returns:
Error: 'x' and 'w' must have the same length
My dataset does not have any NA's in the both the page views and the bounce rates.
I'm not sure what I'm doing wrong, maybe summarise()
doesn't work with weighted.mean()
?
EDIT
I've added some data:
### Source: local data frame [4 x 3]
### pagename bounceRate pageviews
(chr) (dbl) (dbl)
###1 url1 72.22222 1176
###2 url2 46.42857 733
###3 url2 76.92308 457
###4 url3 62.06897 601
Upvotes: 28
Views: 39410
Reputation: 206546
The summarize()
command replaces variables in the order they appear in the command, so because you are changing the value of pageviews, that new value is being used in the weighted.mean. It's safer to use different names
df %>%
group_by(pagename)%>%
summarise(pageviews_sum = sum(pageviews),
bounceRate_mean = weighted.mean(bounceRate,pageviews))
And if you really want, you can rename afterward
df %>%
group_by(pagename) %>%
summarise(pageviews_sum = sum(pageviews),
bounceRate_mean = weighted.mean(bounceRate,pageviews)) %>%
rename(pageviews = pageviews_sum, bounceRate = bounceRate_mean)
Upvotes: 41
Reputation: 489
I've found the solution.
Since summarise(pageviews=sum(pageviews)
is evaluated before bounceRate= weighted.mean(bounceRate,pageviews)
, the length of pageviews
is reduced and therefore shorter than bounceRate
, which triggers the error.
The solution is simple, just switch them:
require(dplyr)
df<-df%>%
group_by(pagename)%>%
summarise(bounceRate= weighted.mean(bounceRate,pageviews),pageviews=sum(pageviews))
Upvotes: 8