Sam Globus
Sam Globus

Reputation: 585

Comparing two columns in a data frame across many rows

I have a data frame that I'm working with in which I'd like to compare a data point Genotype with two references S288C and SK1. This comparison will be done across many rows (100+) of the data frame. Here are the first few lines of my data frame:

    Assay   Genotype S288C SK1
1   CCT6-002     G     A    G
2   CCT6-007     G     A    G
3   CCT6-013     C     T    C
4   CCT6-015     G     A    G
5   CCT6-016     G     G    T

As a final product, I'd like a character string of 1's (S288C) and 0's (SK1) depending on which of the references the data point matches. Thus in the example above I'd like an output of 00001 since all except the last match SK1.

Upvotes: 15

Views: 76238

Answers (1)

user554546
user554546

Reputation:

A nested ifelse should do it (take a look at help(ifelse) for usage):

ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))

With this test data:

> dat
     Genotype S288C SK1
[1,] "G"      "A"   "G"
[2,] "G"      "A"   "G"
[3,] "C"      "T"   "C"
[4,] "G"      "A"   "G"
[5,] "G"      "G"   "T"
[6,] "G"      "A"   "A"

We get:

> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1]  0  0  0  0  1 NA

(Note: If you have trouble using this, you'll want to make sure that the columns are vectors, and are not treated by R as factors...a simple for loop should do it: for (i in 1:ncol(dat)){dat[,i]=as.vector(dat[,i])}).

Upvotes: 20

Related Questions