Reputation: 403
I have a data frame with 4 columns of data: url ID
, url
, title
and pageviews
, like this:
ID url title pageviews
1 /12/ /url-1/ Page 1 1123
2 /13/ /url-2/ Page 2 4432
3 /13/ /url-2/?test Page 2 6
4 /14/ /url-4/ Page 4 4242
I spent a long time searching about how to merge the rows (pages) that have the same ID
and sum the pageviews
. I ended up with this code, using dplyr:
df_merged <- df %>% group_by(ID) %>% summarise_at(c("pageviews"), sum)
However, it creates another data frame with only ID and pageviews. I want to achieve a complete data frame, with url
and title
again. Something like this:
ID url title pageviews
1 /12/ /url-1/ Page 1 1123
2 /13/ /url-2/ Page 2 4438
3 /14/ /url-4/ Page 4 4242
How can I achieve this result?
This is my data frame:
df <- data.frame(ID = c("/12/", "/13/", "/13/", "/14/"),
url = c("/url-1/", "/url-2/", "/url-2/?test", "/url-4/"),
title = c("Page 1", "Page 2", "Page 2", "Page 4"),
pageviews = c(1123, 4432, 6, 4242))
Upvotes: 1
Views: 1221
Reputation: 4824
One way to do it is like this:
df_merged <-
df %>%
group_by(ID, title) %>%
summarise(url = first(url),
total_pageviews = sum(pageviews)
)
You need to think about how R should know that the desired output for url
in the case of Page 2 is /url-2/
and not /url-2/?test
or something else. Here, I just arbitrarily decided that the value to be put there is the first()
value that occurs in the group.
Upvotes: 1