Reputation: 360
I have a data.frame called color
with sample names. I want to assign colors to it according to the ending .U1
or .U2
.
color
samples
1 30HB.U2
2 41ML.U2
3 22WS.U1
4 29MK.U1
5 29MK.U2
6 40WA.U1
7 30HB.U1
8 13BS.U1
9 50DM.U1
10 53BD.U1
11 36ER.U1
12 05AP.U1
13 06WT.U1
14 07RW.U1
15 07RW.U2
16 17SK.U1
17 26FB.U1
18 28HM.U1
19 31KE.U1
20 32FG.U1
21 34WF.U1
22 37SD.U1
23 41ML.U1
24 45GL.U2
25 47OT.U1
26 49RJ.U1
27 54SL.U1
28 54SL.U2
29 69HL.U1
30 69HL.U2
[...]
color <- color %>%
mutate(col = case_when(
samples == color$samples[grepl(color$samples,pattern = '.U1') == TRUE] ~ 'red',
samples == color$samples[grepl(color$samples,pattern = '.U2') == TRUE] ~ 'blue'))
Not every color assignment worked.
color
samples col
1 30HB.U2 blue
2 41ML.U2 blue
3 22WS.U1 <NA>
4 29MK.U1 <NA>
14 07RW.U1 <NA>
15 07RW.U2 <NA>
16 17SK.U1 <NA>
24 45GL.U2 <NA>
25 47OT.U1 <NA>
26 49RJ.U1 <NA>
27 54SL.U1 <NA>
28 54SL.U2 <NA>
29 69HL.U1 <NA>
30 69HL.U2 <NA>
31 74SA.U1 <NA>
[...]
50 05AP.U2 <NA>
51 36ER.U2 <NA>
52 40WA.U2 <NA>
53 35AD.U2 <NA>
54 47OT.U2 <NA>
55 28HM.U2 <NA>
56 38AR.U2 <NA>
57 66DG.U2 <NA>
58 35AD.U1 <NA>
59 57MT.U2 blue
60 39DA.U2 blue
61 37SD.U2 blue
62 49RJ.U2 blue
Why does it not work? I think it is strange that the first and latter assignments work... Thank you for any suggestions
Upvotes: 0
Views: 48
Reputation: 72663
You could simply use substring
and factor
labels.
color <- transform(color, col=factor(substring(db$samples, 6), labels=c("red", "blue")))
color
# samples col
# 1 30HB.U2 blue
# 2 41ML.U2 blue
# 3 22WS.U1 red
# 4 29MK.U1 red
# 5 29MK.U2 blue
# 6 40WA.U1 red
# 7 30HB.U1 red
# 8 13BS.U1 red
# 9 50DM.U1 red
# 10 53BD.U1 red
# 11 36ER.U1 red
# 12 05AP.U1 red
# 13 06WT.U1 red
# 14 07RW.U1 red
# 15 07RW.U2 blue
# 16 17SK.U1 red
# 17 26FB.U1 red
# 18 28HM.U1 red
# 19 31KE.U1 red
# 20 32FG.U1 red
# 21 34WF.U1 red
# 22 37SD.U1 red
# 23 41ML.U1 red
# 24 45GL.U2 blue
# 25 47OT.U1 red
# 26 49RJ.U1 red
# 27 54SL.U1 red
# 28 54SL.U2 blue
# 29 69HL.U1 red
# 30 69HL.U2 blue
Data:
color <- structure(list(samples = c("30HB.U2", "41ML.U2", "22WS.U1", "29MK.U1",
"29MK.U2", "40WA.U1", "30HB.U1", "13BS.U1", "50DM.U1", "53BD.U1",
"36ER.U1", "05AP.U1", "06WT.U1", "07RW.U1", "07RW.U2", "17SK.U1",
"26FB.U1", "28HM.U1", "31KE.U1", "32FG.U1", "34WF.U1", "37SD.U1",
"41ML.U1", "45GL.U2", "47OT.U1", "49RJ.U1", "54SL.U1", "54SL.U2",
"69HL.U1", "69HL.U2")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30"))
Upvotes: 1
Reputation: 5747
The mutate(col = case_when(samples == ...))
structure is designed to compare individual values of samples. So as this mutate operation works, it compares each value in samples
to your logical vector produced by grepl
over the entire samples
variable. Unintended results occurred.
Here is a way to do it using your grepl
expression. Replace ==
with %in% since you want to check if each value of sample is one of the set compared against.
color <- color %>%
mutate(col = case_when(
samples %in% color$samples[grepl(color$samples,pattern = '.U1') == TRUE] ~ 'red',
samples %in% color$samples[grepl(color$samples,pattern = '.U2') == TRUE] ~ 'blue'))
Here is a simpler way to use grepl
.
color <- color %>%
mutate(col = case_when(
grepl(".U1", samples) ~ 'red',
grepl(".U2", samples) ~ 'blue'))
You could also use str_detect
from stringr
.
library(stringr)
color <- color %>%
mutate(col = case_when(str_detect(samples, ".U1") ~ 'red',
str_detect(samples, ".U1") ~ 'blue'))
Upvotes: 1