how to do multiple gsubs and mutate

I have a huge data frame like this

      scan_id sample
1  s10w_00001      1
2  s10w_00002      2
3  s10w_00003      3
4  s10w_00004      4
5  s11d_00001      5
6  s11d_00002      6
7  s11d_00003      7
8  s11d_00004      8
9  s11w_00001      9
10 s11w_00002     10
11 s11w_00003     11

I want to add another column called size. However, that column should correspond with the scan_id column, where all the scan_id with 00001 at the end should be big, 00002=medium, 00003=small 00004=extra small.

the data frame should look something like this

      scan_id sample sixe
1  s10w_00001      1  big
2  s10w_00002      2 medium
3  s10w_00003      3 small
4  s10w_00004      4 extra small
5  s11d_00001      5 big
6  s11d_00002      6 medium
7  s11d_00003      7 small
8  s11d_00004      8 extra small
9  s11w_00001      9 big 
10 s11w_00002     10 medium 
11 s11w_00003     11 small

how do i go about doing that?

Upvotes: 2

Views: 249

Answers (2)

cephalopod
cephalopod

Reputation: 1906

Does this work for you:

library(tidyverse)

df %>%
  separate(col = scan_id, into = c("scan", "id"), sep = "_") %>%
  mutate(size = case_when(id == "00001" ~ "big",
                          id == "00002" ~ "medium",
                          TRUE          ~ "small")) %>%
  unite(col = "scan_id", c("scan", "id"), sep = "_")

Upvotes: 2

JBGruber
JBGruber

Reputation: 12420

Here is a solution. Note that when scan_id is neither 00001 nor 00002, size will be "small":

library(dplyr)
df_clean <- df %>% 
  mutate(size = ifelse(grepl("00001", scan_id), "big", "small")) %>% 
  mutate(size = ifelse(grepl("00002", scan_id), "medium", size)) %>%   

mutate(size = ifelse(grepl("00003", scan_id), "small", size))
> df_clean
      scan_id sample        size
1  s10w_00001      1         big
2  s10w_00002      2      medium
3  s10w_00003      3       small
4  s10w_00004      4 extra small
5  s11d_00001      5         big
6  s11d_00002      6      medium
7  s11d_00003      7       small
8  s11d_00004      8 extra small
9  s11w_00001      9         big
10 s11w_00002     10      medium
11 s11w_00003     11       small

Data

You should usually provide data with dput which transforms your data.frame into text that is easy to read in. Here is the data I used:

df <- read.table(text =
  "scan_id sample
  1  s10w_00001      1
  2  s10w_00002      2
  3  s10w_00003      3
  4  s10w_00004      4
  5  s11d_00001      5
  6  s11d_00002      6
  7  s11d_00003      7
  8  s11d_00004      8
  9  s11w_00001      9
  10 s11w_00002     10
  11 s11w_00003     11", header = TRUE)

Upvotes: 0

Related Questions