jean
jean

Reputation: 403

Merging data frame rows in R with dplyr

I have a data frame with 4 columns of data: url ID, url, title and pageviews, like this:

    ID          url  title pageviews
1 /12/      /url-1/ Page 1      1123
2 /13/      /url-2/ Page 2      4432
3 /13/ /url-2/?test Page 2         6
4 /14/      /url-4/ Page 4      4242

I spent a long time searching about how to merge the rows (pages) that have the same ID and sum the pageviews. I ended up with this code, using dplyr:

df_merged <- df %>% group_by(ID) %>% summarise_at(c("pageviews"), sum)

However, it creates another data frame with only ID and pageviews. I want to achieve a complete data frame, with url and title again. Something like this:

    ID          url  title pageviews
1 /12/      /url-1/ Page 1      1123
2 /13/      /url-2/ Page 2      4438
3 /14/      /url-4/ Page 4      4242

How can I achieve this result?

This is my data frame:

df <- data.frame(ID = c("/12/", "/13/", "/13/", "/14/"), 
             url = c("/url-1/", "/url-2/", "/url-2/?test", "/url-4/"),
             title = c("Page 1", "Page 2", "Page 2", "Page 4"),
             pageviews = c(1123, 4432, 6, 4242))

Upvotes: 1

Views: 1221

Answers (1)

Curt F.
Curt F.

Reputation: 4824

One way to do it is like this:

df_merged <- 
       df %>% 
       group_by(ID, title) %>% 
       summarise(url = first(url),
                 total_pageviews = sum(pageviews)
                )

You need to think about how R should know that the desired output for url in the case of Page 2 is /url-2/ and not /url-2/?test or something else. Here, I just arbitrarily decided that the value to be put there is the first() value that occurs in the group.

Upvotes: 1

Related Questions