Wilcar
Wilcar

Reputation: 2513

Summarise a dataframe structure. Rows are variables

I have this data :

name <- c("john", "john", "Ted", "Ted", "Wiliam", NA)
city <- c(NA, "London", "Paris", "Roma", "Roma", NA)
age <- c(45, 67, 45, NA, NA , 34)
dataset <- data.frame(name, city, age)

The result :

    name   city age
1   john   <NA>  45
2   john London  67
3    Ted  Paris  45
4    Ted   Roma  NA
5 Wiliam   Roma  NA
6   <NA>   <NA>  34
7  Brice London  67
8   Will Mumbay  15
9   Will London  NA

I want a data frame (for publishing with gt package), with :

variable_name  unique_values_sample
name           John,Ted, Will,...
city           London, Roma, Paris,...
age            45, 67, 15, ....

In my real dataset I have thousand of unique values for some variables, so I want to display only some unique values (as examples) and display "..."

Upvotes: 0

Views: 52

Answers (4)

mt1022
mt1022

Reputation: 17299

You can do it with a custom summarizing function:

summarize_var <- function(x){
    x <- unique(x[!is.na(x)])
    if(length(x) <= 3){
        toString(x)
    }else{
        paste(toString(x[1:3]), '...', sep = ',')
    }
}

A test:

dataset <- structure(list(name = c("john", "john", "Ted", "Ted", "Wiliam", 
    NA, "Brice", "Will", "Will"), city = c(NA, "London", "Paris", 
        "Roma", "Roma", NA, "London", "Mumbay", "London"), age = c(45L, 
            67L, 45L, NA, NA, 34L, 67L, 15L, NA)), row.names = c(NA, -9L),
    class = "data.frame", index = integer(0))

x <- sapply(dataset, summarize_var)
data.frame(variable_name = names(x),  unique_values_sample = x)

#      variable_name    unique_values_sample
# name          name   john, Ted, Wiliam,...
# city          city London, Paris, Roma,...
# age            age          45, 67, 34,...

Upvotes: 1

Onyambu
Onyambu

Reputation: 79288

In Base R you will do:

aggregate(.~ind, unique(stack(dataset)), function(x)sprintf("%s, ...", toString(x[1:3])))

   ind                   values
1 name   john, Ted, Wiliam, ...
2 city London, Paris, Roma, ...
3  age          45, 67, 34, ...

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389055

library(tidyverse)

dataset %>%
  summarise(across(.fns = ~paste0(toString(c(unique(na.omit(.))[1:3], '...'))))) %>%
  pivot_longer(cols = everything(), 
               names_to = 'variable_name', values_to = 'unique_values_sample')

#  variable_name unique_values_sample    
#  <chr>         <chr>                   
#1 name          john, Ted, Wiliam, ...  
#2 city          London, Paris, Roma, ...
#3 age           45, 67, 34, ...         

Upvotes: 2

Waldi
Waldi

Reputation: 41230

You could use lapply and map_dfr:

library(purrr)

maxelt<- 2
lapply(dataset,function(x) sort(unique(x))) %>% 
  map_dfr(~data.frame(unique_values_sample=paste0(paste(.x[1:maxelt],collapse=','),ifelse(length(.x)>maxelt,',...',''))),.id='variable_name')

  variable_name unique_values_sample
1          name         john,Ted,...
2          city     London,Paris,...
3           age            34,45,...

Upvotes: 1

Related Questions