MarcoD
MarcoD

Reputation: 137

Match columns and lists

Sorry for the title, hope it is not too misleading. I have the following dataframe df1:

 id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or

Then I have "df2" that has the following information (some of the values in df1$id1 are included in df2$id2 and viceversa) which is a column in another dataset or different length of the first one.

 id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns

What I am trying to do is to count how many "id1" have a common value (i.e. "ns") with id2 in any of the clas columns (i.e. "ns").

So I have tried this:

 x<-as.numeric(levels(factor(df2$id2)))
 clas<-ls()
 for(i in 1:x){
   for(j in 1:length(df1$id1)){
     if(df1$id1==i){clas[[i]]=append(clas[[i]],c(df1$clas1[j],df1$clas2[j],df1$clas3[j]))}
   }
 }

What I am trying to do here is to create a list including all the clas1, clas2 or clas3 when id1 is repeated so that I could then later see when the value in clas0 is included somewhere in the list? However I keep getting the following warning:

    In if (id1$id1 == i) { ... :
 the condition has length > 1 and only the first element will be used

I am stuck. Could someone point me in the right direction? Many thanks Marco

Upvotes: 0

Views: 40

Answers (1)

Roland
Roland

Reputation: 132676

What I am trying to do is to count how many "id1" have a common value (i.e. "ns") with id2 in any of the clas columns (i.e. "ns").

df1 <- read.table(text="id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or", header=TRUE)

df2 <- read.table(text=" id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns", header=TRUE)

df <- merge(df1, df2, by.x="id1", by.y="id2")
sum(apply(df$clas0 == df[, c("clas1", "clas2", "clas3")], 1, any, na.rm = TRUE))
#[1] 5

Upvotes: 1

Related Questions