Reputation: 71
I have a named list of data frames that all contain the same columns, but for some of these data frames some of these columns are empty. What Im hoping to return is the name of the data frame in the list, and the name(s) of the empty column.
The repex below mirrors the process I am using on the full problem
library(tidyverse)
data("diamonds")
data1 <- diamonds
data1$color <- NA
data1$price <- NA
data2 <- diamonds
data2$carat <- NA
data1$Type <- "data1"
data2$Type <- "data2"
data1%>%
bind_rows(data2) -> dataFull
dataSplit <- split(dataFull, f = dataFull$Type)
for(i in dataSplit){
which(sapply(dataSplit[[i]], function(x) all(is.na(x))))
}
My hope is to return something like
data1: price, color
data2: carat
I've tried the very basic for-loop included above, which are admittedly not my strong suit.
Upvotes: 0
Views: 140
Reputation: 887691
Using select
library(dplyr)
library(purrr)
map(dataSplit, ~ .x %>%
select(where(~ all(is.na(.x)))) %>%
names)
$data1
[1] "color" "price"
$data2
[1] "carat"
Or in base R
lapply(dataSplit, \(x) names(x)[!colSums(!is.na(x))])
$data1
[1] "color" "price"
$data2
[1] "carat"
Upvotes: 0
Reputation: 1614
library(tidyverse)
data("diamonds")
data1 <- diamonds
data1$color <- NA
data1$price <- NA
data2 <- diamonds
data2$carat <- NA
data1$Type <- "data1"
data2$Type <- "data2"
data1%>%
bind_rows(data2) -> dataFull
dataSplit <- split(dataFull, f = dataFull$Type)
lapply(dataSplit, function(x) {
cn <- colnames(x)
isempty <- apply(x, 2, function(col) is.na(col) |> all())
cn[ isempty ]
})
$data1
[1] "color" "price"
$data2
[1] "carat"
Upvotes: 2
Reputation: 174393
Your sapply
idea was right, but you need to subset the names of each data frame with the output. Also, since you are loading the tidyverse, you may as well use map
instead of a loop for brevity:
map(dataSplit, ~ names(.x)[sapply(.x, \(x) all(is.na(x)))])
#> $data1
#> [1] "color" "price"
#>
#> $data2
#> [1] "carat"
Upvotes: 4