cigarettes_after_text
cigarettes_after_text

Reputation: 61

Replace all values in a data frame, conditionally

I have a numeric data frame, and I'm trying to change each value in it to a string and then summarize it, proportionally, by that string.

Example table:

var1 var2 var3 var4
 2    3    5    8
 3    6    8    7
 5    3    7    4

If a number is less than 4, I would like to replace it with "no", and if a number is 4 or greater, I would like to replace it with "yes"

I then would like to summarize the proportion of "yes" and "no" values for each column.

I've tried using dplyr, and it seems easy to do for a single column, but I'm having trouble applying it across all columns. I tried an approach like this, but didn't have success.

df %>%
select(var1:var4) %>%
mutate_each(funs(replace(., . < 4, "no")) %>%
mutate_each(funs(replace., . => 4, "yes")) %>%
summarise(count = n())

Thanks for any help.

Upvotes: 1

Views: 252

Answers (3)

TarJae
TarJae

Reputation: 78917

Here is a similar answer to akrun's (which is the original) using ifelse and group_by and summarise:

df %>% 
  mutate(across(var1:var4, ~ifelse(. < 4, 'no', 'yes'))) %>% 
  pivot_longer(
    cols = starts_with("var")) %>% 
  group_by(name, value) %>% 
  summarise(count = n()) %>% 
  mutate(prop = count/sum(count)*100)

Output:

  name  value count  prop
  <chr> <chr> <int> <dbl>
1 var1  no        2  66.7
2 var1  yes       1  33.3
3 var2  no        2  66.7
4 var2  yes       1  33.3
5 var3  yes       3 100  
6 var4  yes       3 100  

Upvotes: 2

akrun
akrun

Reputation: 886938

We can loop across the columns, convert the values to 'no', 'yes', reshape to 'long' format with pivot_longer, get the count and divide by the sum of the count after grouping by 'name' to return the proportions

library(dplyr)
library(tidyr)
df %>%
      mutate(across(everything(), 
          ~ case_when(. < 4 ~ 'no', TRUE ~ 'yes'))) %>% 
      pivot_longer(everything()) %>% 
      count(name, value) %>%
      group_by(name) %>%
      mutate(prop = 100 * n/sum(n)) %>%
      ungroup

-output

# A tibble: 6 x 4
  name  value     n  prop
  <chr> <chr> <int> <dbl>
1 var1  no        2  66.7
2 var1  yes       1  33.3
3 var2  no        2  66.7
4 var2  yes       1  33.3
5 var3  yes       3 100  
6 var4  yes       3 100  

Or using base R

100 * proportions(table(names(df)[col(df)], c('no', 'yes')[(df >= 4) + 1]), 1)
      
              no       yes
  var1  66.66667  33.33333
  var2  66.66667  33.33333
  var3   0.00000 100.00000
  var4   0.00000 100.00000

data

df <- structure(list(var1 = c(2L, 3L, 5L), var2 = c(3L, 6L, 3L), var3 = c(5L, 
8L, 7L), var4 = c(8L, 7L, 4L)), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 2

crestor
crestor

Reputation: 1466

library(tidyverse)

df <- data.frame(
  var1 = c(2L, 3L, 5L),
  var2 = c(3L, 6L, 3L),
  var3 = c(5L, 8L, 7L),
  var4 = c(8L, 7L, 4L)
)

df %>%
  mutate(across(.fns = ~ . >= 4)) %>%
  summarise(across(.fns = ~ sum(.)/length(.) ))
#>        var1      var2 var3 var4
#> 1 0.3333333 0.3333333    1    1

Upvotes: 3

Related Questions