user8858830
user8858830

Reputation:

Using recode and case_when together

Table1$subject contains the variables "Biology", "Chemistry", and "Physics". For Table 2, I want to recode this, to replace all instances of Biology/Chemistry with 1 and all instances of Physics with 0.

I tried the following code, since I believe this is achievable using the recode and case_when commands:

    Table2 <- recode(Table1, case_when(
    .$subject <= "biology" ~ 1,
    .$subject <= "chemistry" ~ 1,
    .$subject <= "physics" ~ 0))

Currently, I get an error message saying "case_when must be a two-sided formula, not a logical". I'm new to R so I'm not quite sure what I'm doing wrong. Really grateful if anyone has any ideas!

Upvotes: 4

Views: 5754

Answers (2)

leerssej
leerssej

Reputation: 14958

This reminded me of when I first started working with R, too, and I walked over and asked the data scientists this very same question.

They shared with me a different approach that is generally preferable in these situations. I have looked back many times and appreciated learning it early on.

The database normalization approach (unless someone out there can help us with a better name) involves mapping your code values into a separate dataframe. Then you take that collection of mapped values and join them to the dataframe you are wanting to encode.

This helps keep the code more strictly responsible for manipulations, and the dataframes responsible for holding values/data. This not only can speed up much of your work, saving you from hand-coding in hard-coded lookup tables, but in the longer term it will make it much easier when someone is debugging or performing modifications and re-developments.

The normalized data management approach then would look like:

# your code mapping
df_map <- tribble(~subject,    ~subj_cd,
                  "chemistry", 1,
                  "biology",   1,
                  "physics",   0)

# a dummy raw dataframe that you might be wanting to encode
df_raw <- tibble(stud_id = 2678:2877,
                 subject = sample(c("chemistry",
                                    "biology",
                                    "physics",
                                    "astronomy"), 200, replace = TRUE))

# encoding the data
df_coded <- 
    df_raw %>% 
    left_join(df_map)
df_code
> df_coded
# A tibble: 200 x 3
   stud_id   subject subj_cd
     <int>     <chr>   <dbl>
 1    2678   physics       2
 2    2679   physics       2
 3    2680   biology       1
 4    2681 astronomy      NA
 5    2682 chemistry       1
 6    2683 chemistry       1
 7    2684   physics       2
 8    2685 chemistry       1
 9    2686 chemistry       1
10    2687 astronomy      NA
# ... with 190 more rows

If you find yourself needing a quick and easy way to build longer code maps (or, especially, share them with other folks), then you will probably find Jenny Brian's googlesheets package very helpful (she's a member of team tidyverse) A really helpful vignette for it can be found here

Upvotes: 3

austensen
austensen

Reputation: 3007

Both recode and case_when operate on vectors, not data frames. So to create a new data frame you need to first call mutate, and then within mutate use either recode or case_when to create a new column (or overwrite an existing one).

(Also, as of the latest dplyr release you no longer need to use the .$ when using case_when)


library(tibble)
library(dplyr)

df <- tribble(
  ~subject,
  "chemistry",
  "biology",
  "physics"
)

df %>% 
  mutate(subject2 = case_when(
    subject == "chemistry" ~ 1,
    subject == "biology" ~ 1,
    subject == "physics" ~ 2,
  ))

#> # A tibble: 3 x 2
#>     subject subject2
#>       <chr>    <dbl>
#> 1 chemistry        1
#> 2   biology        1
#> 3   physics        2

df %>% 
  mutate(subject2 = recode(
    subject, 
    "chemistry" = 1,
    "biology" = 1,
    "physics" = 2,
  ))

#> # A tibble: 3 x 2
#>     subject subject2
#>       <chr>    <dbl>
#> 1 chemistry        1
#> 2   biology        1
#> 3   physics        2

Upvotes: 5

Related Questions