Reputation: 321
I have three columns in a dataframe: age, gender and income.
I want to loop through these columns and create plots based on the data in them.
I know in stata you can loop through variables and then run commands with those variables. However the code below does not seem to work, is there an equivalent way to do what I want to do in R?
groups <- c(df$age, df$gender, df$income)
for (i in groups){
df %>% group_by(i) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i)) +
geom_col()
}
Upvotes: 1
Views: 2495
Reputation: 9858
you can also use the tidyverse. Loop through a vector of grouping variable names with map
. On every iteration, you can evaluate !!sym(variable)
the variable name to group_by
. Alternatively, we can use across(all_of())
, wihch can take strings directly as column names. The rest of the code is pretty much the same you used.
library(dplyr)
library(purrr)
groups <- c('age', 'gender', 'income')
## with !!(sym(.x))
map(groups, ~
df %>% group_by(!!sym(.x)) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i)) +
geom_col()
)
## with across(all_of())
map(groups, ~
df %>% group_by(across(all_of(.x))) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i)) +
geom_col()
)
If you want to use a for loop:
groups <- c('age', 'gender', 'income')
for (i in groups){
df %>% group_by(!!sym(i)) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i)) +
geom_col()
}
Upvotes: 4
Reputation: 3994
You can use lapply
df <- data.frame(age = sample(c("26-30", "31-35", "36-40", "41-45"), 20, replace = T),
gender = sample(c("M", "F"), 20, replace = T),
income = sample(c("High", "Medium", "Low"), 20, replace = T),
prop = runif(20))
lapply(df[,c(1:3)], function(x) ggplot(data = df, aes(y = df$prop, x = x))+ geom_col())
Upvotes: 2