Reputation: 585
I have a data frame that I'm working with in which I'd like to compare a data point Genotype
with two references S288C
and SK1
. This comparison will be done across many rows (100+) of the data frame. Here are the first few lines of my data frame:
Assay Genotype S288C SK1
1 CCT6-002 G A G
2 CCT6-007 G A G
3 CCT6-013 C T C
4 CCT6-015 G A G
5 CCT6-016 G G T
As a final product, I'd like a character string of 1's (S288C
) and 0's (SK1
) depending on which of the references the data point matches. Thus in the example above I'd like an output of 00001
since all except the last match SK1
.
Upvotes: 15
Views: 76238
Reputation:
A nested ifelse
should do it (take a look at help(ifelse)
for usage):
ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
With this test data:
> dat
Genotype S288C SK1
[1,] "G" "A" "G"
[2,] "G" "A" "G"
[3,] "C" "T" "C"
[4,] "G" "A" "G"
[5,] "G" "G" "T"
[6,] "G" "A" "A"
We get:
> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1] 0 0 0 0 1 NA
(Note: If you have trouble using this, you'll want to make sure that the columns are vectors, and are not treated by R as factors...a simple for loop should do it: for (i in 1:ncol(dat)){dat[,i]=as.vector(dat[,i])}
).
Upvotes: 20