Reputation: 2513
I have this data :
name <- c("john", "john", "Ted", "Ted", "Wiliam", NA)
city <- c(NA, "London", "Paris", "Roma", "Roma", NA)
age <- c(45, 67, 45, NA, NA , 34)
dataset <- data.frame(name, city, age)
The result :
name city age
1 john <NA> 45
2 john London 67
3 Ted Paris 45
4 Ted Roma NA
5 Wiliam Roma NA
6 <NA> <NA> 34
7 Brice London 67
8 Will Mumbay 15
9 Will London NA
I want a data frame (for publishing with gt package), with :
variable_name unique_values_sample
name John,Ted, Will,...
city London, Roma, Paris,...
age 45, 67, 15, ....
In my real dataset I have thousand of unique values for some variables, so I want to display only some unique values (as examples) and display "..."
Upvotes: 0
Views: 52
Reputation: 17299
You can do it with a custom summarizing function:
summarize_var <- function(x){
x <- unique(x[!is.na(x)])
if(length(x) <= 3){
toString(x)
}else{
paste(toString(x[1:3]), '...', sep = ',')
}
}
A test:
dataset <- structure(list(name = c("john", "john", "Ted", "Ted", "Wiliam",
NA, "Brice", "Will", "Will"), city = c(NA, "London", "Paris",
"Roma", "Roma", NA, "London", "Mumbay", "London"), age = c(45L,
67L, 45L, NA, NA, 34L, 67L, 15L, NA)), row.names = c(NA, -9L),
class = "data.frame", index = integer(0))
x <- sapply(dataset, summarize_var)
data.frame(variable_name = names(x), unique_values_sample = x)
# variable_name unique_values_sample
# name name john, Ted, Wiliam,...
# city city London, Paris, Roma,...
# age age 45, 67, 34,...
Upvotes: 1
Reputation: 79288
In Base R you will do:
aggregate(.~ind, unique(stack(dataset)), function(x)sprintf("%s, ...", toString(x[1:3])))
ind values
1 name john, Ted, Wiliam, ...
2 city London, Paris, Roma, ...
3 age 45, 67, 34, ...
Upvotes: 1
Reputation: 389055
library(tidyverse)
dataset %>%
summarise(across(.fns = ~paste0(toString(c(unique(na.omit(.))[1:3], '...'))))) %>%
pivot_longer(cols = everything(),
names_to = 'variable_name', values_to = 'unique_values_sample')
# variable_name unique_values_sample
# <chr> <chr>
#1 name john, Ted, Wiliam, ...
#2 city London, Paris, Roma, ...
#3 age 45, 67, 34, ...
Upvotes: 2
Reputation: 41230
You could use lapply
and map_dfr
:
library(purrr)
maxelt<- 2
lapply(dataset,function(x) sort(unique(x))) %>%
map_dfr(~data.frame(unique_values_sample=paste0(paste(.x[1:maxelt],collapse=','),ifelse(length(.x)>maxelt,',...',''))),.id='variable_name')
variable_name unique_values_sample
1 name john,Ted,...
2 city London,Paris,...
3 age 34,45,...
Upvotes: 1