Reputation: 1271
with str(data)
I get the head
of the levels (1-2 values)
fac1: Factor w/ 2 levels ... :
fac2: Factor w/ 5 levels ... :
fac3: Factor w/ 20 levels ... :
val: num ...
with dplyr::glimpse(data)
I get more values, but no infos about number/values of factor-levels. Is there an automatic way to get all level informations of all factor vars in a data.frame? A short form with more info for
levels(data$fac1)
levels(data$fac2)
levels(data$fac3)
or more precisely a elegant version for something like
for (n in names(data))
if (is.factor(data[[n]])) {
print(n)
print(levels(data[[n]]))
}
thx Christof
Upvotes: 35
Views: 139501
Reputation: 1
library(dplyr) #for all the following
df$factor %>% unique() %>% str()
lists and counts the frequency of levels of a specific variable
count(df,variable)
returns a table with the levels of a specific variable and its frequency. the number of rows will inform how many levels there are for this variable.
count(df,across())
returns a table of all variables levels that co-occur in observations and the frequency of all different combinations
Upvotes: 0
Reputation: 4168
As a long data frame (tibble):
df %>% gather(name, value) %>% count(name, value)
This converts all the columns into name-value pairs, and then counts the unique levels.
Subset column types with something like:
df %>% select_if(is.character) %>% ...
Via https://stackoverflow.com/a/47122651/3217870
Upvotes: 0
Reputation: 2950
Or using purrr:
data %>% purrr::map(levels)
Or to first factorize everything:
data %>% dplyr::mutate_all(as.factor) %>% purrr::map(levels)
And answering the question about how to get the lengths:
data %>% map(levels) %>% map(length)
Upvotes: 6
Reputation: 1
Alternate option to get length of levels in a 'data'.frame:
data_levels_length <- sapply(seq(1, ncol(data)), function(x){
length(levels(data[,x]))
})
Upvotes: 0
Reputation: 2361
In case you want to display factor levels only for thos columns which are declared as.factor
, you can use:
lapply(df[sapply(df, is.factor)], levels)
Upvotes: 2
Reputation: 121
If your problem is specifically to output a list of all levels for a factor, then I have found a simple solution using :
unique(df$x)
For instance, for the infamous iris dataset:
unique(iris$Species)
Upvotes: 10
Reputation: 61
A simpler method is to use the sqldf package and use a select distinct statement. This makes it easier to automatically get the names of factor levels and then specify as levels to other columns/variables.
Generic code snippet is:
library(sqldf)
array_name = sqldf("select DISTINCT *colname1* as '*column_title*' from *table_name*")
Sample code using iris dataset:
df1 = iris
factor1 <- sqldf("select distinct Species as 'flower_type' from df1")
factor1 ## to print the names of factors
Output:
flower_type
1 setosa
2 versicolor
3 virginica
Upvotes: 4
Reputation: 887118
Here are some options. We loop through the 'data' with sapply
and get the levels
of each column (assuming that all the columns are factor
class)
sapply(data, levels)
Or if we need to pipe (%>%
) it, this can be done as
library(dplyr)
data %>%
sapply(levels)
Or another option is summarise_each
from dplyr
where we specify the levels
within the funs
.
data %>%
summarise_each(funs(list(levels(.))))
Upvotes: 35