Tyler
Tyler

Reputation: 1

Summarize results by column in R

I am attempting to summarize the results of an Excel (.xlsx) file in R-Studio. I need to get from this:

*Date      Site          URL*
Jan 1      Amazon        amazon.com
Jan 1      Blackberry
Jan 2      Google        google.com
Jan 2      G-Mail        google.com
Jan 3      Microsoft     microsoft.com
Jan 1      Ohio
Jan 3      X-Box         microsoft.com
Jan 3      YouTube       youtube.com
Jan 3      Yahoo         yahoo.com

to this:

*Date      Site             URL*
Jan 1      Amazon           amazon.com
Jan 1      Blackberry
Jan 2      Google|G-Mail    google.com
Jan 3      Microsoft|X-Box  microsoft.com
Jan 1      Ohio
Jan 3      YouTube          youtube.com
Jan 3      Yahoo            yahoo.com

Essentially, I need to summarize all other columns by the "URL" value, but it is crucial to leave results with a blank value in the "URL" column unaffected by the summarization.

What I have now is:

URLSummarize <-(URLSummarize %>%
  group_by(`URL`) %>%
  summarize(across(everything(), ~ if (nzchar(URL[1])) paste(unique(.), collapse = "|") else .)) %>%
  ungroup())

dput(URLSummarize)

However, this is making those values that are blank in the "URL" column to be condensed so that it looks like this via a dput(URLSummarize):

Date        Site    URL
2022-01-01  Amazon  amazon.com  
2022-01-01  Blackberry  NA  
2022-01-02  Google  google.com  
2022-01-02  G-Mail  google.com  
2022-01-03  Microsoft   microsoft.com   
2022-01-01  Ohio    NA  
2022-01-03  X-Box   microsoft.com   
2022-01-03  YouTube youtube.com 
2022-01-03  Yahoo   yahoo.com

but I am still getting this when I write it to an xlsx file:

*Date      Site             URL*
Jan 1      Amazon           amazon.com
Jan 1      Blackberry|Ohio
Jan 2      Google|G-Mail    google.com
Jan 3      Microsoft|X-Box  microsoft.com
Jan 3      YouTube          youtube.com
Jan 3      Yahoo            yahoo.com

Upvotes: 0

Views: 76

Answers (2)

r2evans
r2evans

Reputation: 160407

You can use an embedded if. (I've edited this to reference your newer data, and because we cannot always see it in your data, I made the blanks an actual one-space string.)

base R

isblank <- !nzchar(trimws(dat$URL))
isblank
# [1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

dat[!isblank,] <- data.frame(
  lapply(dat, function(z) ave(z[!isblank], dat$URL[!isblank],
                              FUN = function(z) paste(unique(z), collapse = "|"))
  )
)
dat[!duplicated(dat),]
#    Date            Site           URL
# 1 Jan 1          Amazon    amazon.com
# 2 Jan 1      Blackberry              
# 3 Jan 2   Google|G-Mail    google.com
# 5 Jan 3 Microsoft|X-Box microsoft.com
# 6 Jan 1            Ohio              
# 8 Jan 3         YouTube   youtube.com
# 9 Jan 3           Yahoo     yahoo.com

(Don't let the row names fool you, seven rows.)

dplyr

library(dplyr)
dat %>%
  group_by(URL) %>%
  summarize(across(everything(), ~ if (nzchar(trimws(URL[1]))) paste(unique(.), collapse = "|") else .)) %>%
  ungroup()
# # A tibble: 7 x 3
#   URL             Date  Site           
#   <chr>           <chr> <chr>          
# 1 " "             Jan 1 Blackberry     
# 2 " "             Jan 1 Ohio           
# 3 "amazon.com"    Jan 1 Amazon         
# 4 "google.com"    Jan 2 Google|G-Mail  
# 5 "microsoft.com" Jan 3 Microsoft|X-Box
# 6 "yahoo.com"     Jan 3 Yahoo          
# 7 "youtube.com"   Jan 3 YouTube        

Or if it's just one column, you can simplify it a little:

dat %>%
  group_by(URL) %>%
  summarize(Site = if (nzchar(trimws(URL[1]))) paste(unique(Site), collapse = "|") else Site) %>%
  ungroup()

data.table

library(data.table)
DT <- as.data.table(dat) # setDT(dat) is generally preferred
DT[, lapply(.SD, function(z) if (nzchar(trimws(URL[1]))) paste(unique(z), collapse = "|") else z), by = .(URL)]
#              URL   Date            Site
#           <char> <char>          <char>
# 1:    amazon.com  Jan 1          Amazon
# 2:                Jan 1      Blackberry
# 3:                Jan 1            Ohio
# 4:    google.com  Jan 2   Google|G-Mail
# 5: microsoft.com  Jan 3 Microsoft|X-Box
# 6:   youtube.com  Jan 3         YouTube
# 7:     yahoo.com  Jan 3           Yahoo

Data

dat <- structure(list(Date = c("Jan 1", "Jan 1", "Jan 2", "Jan 2", "Jan 3", "Jan 1", "Jan 3", "Jan 3", "Jan 3"), Site = c("Amazon", "Blackberry", "Google", "G-Mail", "Microsoft", "Ohio", "X-Box", "YouTube", "Yahoo"), URL = c("amazon.com", " ", "google.com", "google.com", "microsoft.com", " ", "microsoft.com", "youtube.com", "yahoo.com")), row.names = c(NA, -9L), class = "data.frame")

Upvotes: 1

zx8754
zx8754

Reputation: 56004

Add unique dummy missing values, then group by paste using aggregate and toString, then convert missing values into blanks "":

# add a unique dummy "missing" URL
ix <- which(dat$URL == "")
dat[ ix, "URL" ] <- paste0("missing", ix)

# group by paste
res <- aggregate(Site ~ URL, dat, toString)

# convert missing to blanks
res[ grepl("missing", res$URL), "URL" ] <- ""

res
#             URL             Site
# 1    amazon.com           Amazon
# 2    google.com   Google, G-Mail
# 3 microsoft.com Microsoft, X-Box
# 4                     Blackberry
# 5                           Ohio
# 6     yahoo.com            Yahoo
# 7   youtube.com          YouTube

Upvotes: 1

Related Questions