user2494353
user2494353

Reputation: 35

New column in R based on a number of conditions

so I'm trying to make 3 master columns from 5 different columns. I'm looking at 2 different medical tests (TestA and TestB). I looked at a few other answers on making new rows, but couldn't find answers for when there are multiple conditions, and using categorical values.

Currently I have the following columns: TestA2009 TestA2010 TestA2011 TestB2010 TestB2011

The three columns I ultimately want are: 1. Those who have taken TestA (any year) but have never had TestB 2. Those who have taken TestB (any year) but have never had TestA 3. Those who have taken TestA (any year), and TestB (any year)

Values for TestA include things like NA, Positive, Negative, Not Reported, etc.
Values for TestB include things like NA, Reactive, Unsatisfactory, etc.

NA meaning that they did not have the test.

Hope this questions is clear. Thanks so much - I'm quite new at R, and could use all the help I can get!!

EDIT: Thanks everyone for your suggestions. I also tried this method myself. I switched all "NA"s to "0" and all other values to "1". Does it make sense?

TestA <-ifelse(TestA2009==1 | TestA2010==1 | TestA2011==1, "TESTa", "NOtesta")
TestB <-ifelse(TestB2010==1 | TestB2011==1, "TESTb", "NOtestb")

TestAonly <-(TestA==TESTa & TestB=="NOtestb")
TestAandTestB <-(TestA==TESTa & TestB=="TESTb")

Upvotes: 0

Views: 186

Answers (2)

dardisco
dardisco

Reputation: 5274

A reproducible example:

vals1 <- c(NA, "pos", "neg", "nr")
set.seed(1)
df1 <- data.frame(
    id = seq(1:10),
    a09 = sample(vals1,10,replace=TRUE),
    a10 = sample(vals1,10,replace=TRUE),
    a11 = sample(vals1,10,replace=TRUE),
    b10 = sample(vals1,10,replace=TRUE),
    b11 = sample(vals1,10,replace=TRUE)
    )
### modify to give at least one case meeting each of your criteria
df1[10,c(5,6)] <- NA # 2x NAs for b's
df1[1,c(2,3,4)] <- NA # 3x NAs for a's
df1[2,c(2,4,5,6)] <- NA # all NAs

giving:

   id  a09  a10  a11  b10  b11
1   1 <NA> <NA> <NA>  pos   nr
2   2 <NA> <NA> <NA> <NA> <NA>
3   3  neg  neg  neg  pos   nr
4   4   nr  pos <NA> <NA>  neg
5   5 <NA>   nr  pos   nr  neg
6   6   nr  pos  pos  neg   nr
7   7   nr  neg <NA>   nr <NA>
8   8  neg   nr  pos <NA>  pos
9   9  neg  pos   nr  neg  neg
10 10 <NA>   nr  pos <NA> <NA>

Now we chain multiple logical operators to get the id in question. This isn't as elegant as @Carls suggestion above, but may be more intuitive at first glance... Note the grouping brackets i.e. a and (b or c):

### test a not b, id=10 
df1$id[ is.na(df1$b10) & is.na(df1$b11) & 
  ( !is.na(df1$a09) | !is.na(df1$a10) | !is.na(df1$a11) ) ]

### test b not a, id=1
df1$id[ is.na(df1$a09) & is.na(df1$a10) & is.na(df1$a11) &
  & ( !is.na(df1$b10) | !is.na(df1$b11) ) ]

The last example uses the fact that R will convert TRUE to 1 when passed to a method expecting numeric. In this case we want to check whether all 5 values in the row are NA then get the other rows using negation (! means NOT).

### a and b, id= all except no. 2
df1$id[!rowSums(is.na(df1[ ,2:6]))==5]

Quick intros to logical operators: here and here.

Update:

I'm not sure why you got rid of the NAs as all the above suggestions do work with them. First, staying with NA and following your style of expression:

TestA <-ifelse( !is.na(df1$a09) | !is.na(df1$a10) | !is.na(df1$a11), "TESTa","NOtesta")
TestB <-ifelse( !is.na(df1$b10) | !is.na(df1$b11), "TESTb", "NOtestb")

TestAonly <- (TestA=="TESTa" & TestB=="NOtestb")
TestAandTestB <- (TestA=="TESTa" & TestB=="TESTb")

Note that you need quotes around e.g. Testa, otherwise R will try and look for it as a variable rather than a string literal. Also you might consider adopting a simpler naming convention/style for variables e.g. dot.seperator.

The result will be a logical vector of the same length as nrow(df1).

If you're sticking with 1 or 0 use something like the following:

TestB <-ifelse( df1$b10==1 | df1$b11==1, "TESTb", "NOtestb" )

Upvotes: 0

Carl Witthoft
Carl Witthoft

Reputation: 21532

Should be pretty much like this. Call your array mydata, then in very simple steps,

notA <- is.na(mydata[,1])*is.na(mydata[,2])*is.na(mydata[,3])
notB <- is.na(mydata[,4])*is.na(mydata[,5])
AandNotB<- !notA*notB
BandNotA <- notA*!notB
AandB <-!notA*!notB

mydata<-cbind(mydata,AandNotB,BandNotA,AandB)

I'm going on the assumption that any value other than NA is a positive case.

Upvotes: 1

Related Questions