Reputation: 61
I have a numeric data frame, and I'm trying to change each value in it to a string and then summarize it, proportionally, by that string.
Example table:
var1 var2 var3 var4
2 3 5 8
3 6 8 7
5 3 7 4
If a number is less than 4, I would like to replace it with "no", and if a number is 4 or greater, I would like to replace it with "yes"
I then would like to summarize the proportion of "yes" and "no" values for each column.
I've tried using dplyr, and it seems easy to do for a single column, but I'm having trouble applying it across all columns. I tried an approach like this, but didn't have success.
df %>%
select(var1:var4) %>%
mutate_each(funs(replace(., . < 4, "no")) %>%
mutate_each(funs(replace., . => 4, "yes")) %>%
summarise(count = n())
Thanks for any help.
Upvotes: 1
Views: 252
Reputation: 78917
Here is a similar answer to akrun's (which is the original) using ifelse
and group_by
and summarise
:
df %>%
mutate(across(var1:var4, ~ifelse(. < 4, 'no', 'yes'))) %>%
pivot_longer(
cols = starts_with("var")) %>%
group_by(name, value) %>%
summarise(count = n()) %>%
mutate(prop = count/sum(count)*100)
Output:
name value count prop
<chr> <chr> <int> <dbl>
1 var1 no 2 66.7
2 var1 yes 1 33.3
3 var2 no 2 66.7
4 var2 yes 1 33.3
5 var3 yes 3 100
6 var4 yes 3 100
Upvotes: 2
Reputation: 886938
We can loop across
the columns, convert the values to 'no', 'yes', reshape to 'long' format with pivot_longer
, get the count
and divide by the sum
of the count after grouping by 'name' to return the proportions
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(),
~ case_when(. < 4 ~ 'no', TRUE ~ 'yes'))) %>%
pivot_longer(everything()) %>%
count(name, value) %>%
group_by(name) %>%
mutate(prop = 100 * n/sum(n)) %>%
ungroup
-output
# A tibble: 6 x 4
name value n prop
<chr> <chr> <int> <dbl>
1 var1 no 2 66.7
2 var1 yes 1 33.3
3 var2 no 2 66.7
4 var2 yes 1 33.3
5 var3 yes 3 100
6 var4 yes 3 100
Or using base R
100 * proportions(table(names(df)[col(df)], c('no', 'yes')[(df >= 4) + 1]), 1)
no yes
var1 66.66667 33.33333
var2 66.66667 33.33333
var3 0.00000 100.00000
var4 0.00000 100.00000
df <- structure(list(var1 = c(2L, 3L, 5L), var2 = c(3L, 6L, 3L), var3 = c(5L,
8L, 7L), var4 = c(8L, 7L, 4L)), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 2
Reputation: 1466
library(tidyverse)
df <- data.frame(
var1 = c(2L, 3L, 5L),
var2 = c(3L, 6L, 3L),
var3 = c(5L, 8L, 7L),
var4 = c(8L, 7L, 4L)
)
df %>%
mutate(across(.fns = ~ . >= 4)) %>%
summarise(across(.fns = ~ sum(.)/length(.) ))
#> var1 var2 var3 var4
#> 1 0.3333333 0.3333333 1 1
Upvotes: 3