Reputation: 47
I am cleaning some data and like to use the count()
function in dplyr to look at unique values of every variable.
Is there a way to do this automatically? Right now I am using this method:
df %>% count(variable1)
df %>% count(variable2)
df %>% count(variable3)
...
I would like something that returns all of them without me having to repeat the line of code and type in each variable. I thought about trying to have R recognize all the column names and automatically fill them in but I'm not sure where to start. If I just add variables together, say
df %>% count(variable1, variable2)
I get counts by both of those variables when I want individual tables for each variable.
Upvotes: 3
Views: 1531
Reputation: 35554
Assume that you want to count am
, gear
, and carb
from mtcars
. You can apply the function table()
on each variable by map()
, which returns a list
object.
library(dplyr)
library(purrr)
mtcars %>%
select(am, gear, carb) %>%
map(table)
# $am
# 0 1
# 19 13
#
# $gear
# 3 4 5
# 15 12 5
#
# $carb
# 1 2 3 4 6 8
# 7 10 3 10 1 1
base
Version :
lapply(mtcars[c("am", "gear", "carb")], table)
In addition, you can use summary()
, which counts factor variables.
mtcars %>%
select(am, gear, carb) %>%
mutate(across(.fn = as.factor)) %>%
summary
# am gear carb
# 0:19 3:15 1: 7
# 1:13 4:12 2:10
# 5: 5 3: 3
# 4:10
# 6: 1
# 8: 1
Upvotes: 2
Reputation: 405
a simple solution would be to use sapply
or lapply
with table
sapply(df,table)
This will return you a list of count tables for each of the columns for dt. You can always pass in a subsetted dataframe to get the count for your variables of interest.
Upvotes: 1
Reputation: 39595
It looks like you can use a tidyverse
approach to solve your issue. You want to get the counts for each variable in your dataset (Please next time add a sample of df
). You can get something close to what you want using data in long format. I will show you an example with mtcars
data. I will choose some variables that display classes so that they can be summarised with counts. Here the code:
library(tidyverse)
#Data
data("mtcars")
I will select some categorical variables with next code, then I will reshape to long. Finally, I will use summarise()
and n()
(used for counting) with group_by()
to determine the counts:
#Code
mtcars %>% select(cyl,vs,am,gear,carb) %>%
#Format to long
pivot_longer(cols = everything()) %>%
#Group and summarise
group_by(name,value) %>%
summarise(N=n())
Output:
# A tibble: 16 x 3
# Groups: name [5]
name value N
<chr> <dbl> <int>
1 am 0 19
2 am 1 13
3 carb 1 7
4 carb 2 10
5 carb 3 3
6 carb 4 10
7 carb 6 1
8 carb 8 1
9 cyl 4 11
10 cyl 6 7
11 cyl 8 14
12 gear 3 15
13 gear 4 12
14 gear 5 5
15 vs 0 18
16 vs 1 14
As you can see all the variables are showed with their respective groups and counts.
Upvotes: 2