Reputation: 35
so I'm trying to make 3 master columns from 5 different columns. I'm looking at 2 different medical tests (TestA and TestB). I looked at a few other answers on making new rows, but couldn't find answers for when there are multiple conditions, and using categorical values.
Currently I have the following columns: TestA2009 TestA2010 TestA2011 TestB2010 TestB2011
The three columns I ultimately want are: 1. Those who have taken TestA (any year) but have never had TestB 2. Those who have taken TestB (any year) but have never had TestA 3. Those who have taken TestA (any year), and TestB (any year)
Values for TestA include things like NA, Positive, Negative, Not Reported, etc.
Values for TestB include things like NA, Reactive, Unsatisfactory, etc.
NA meaning that they did not have the test.
Hope this questions is clear. Thanks so much - I'm quite new at R, and could use all the help I can get!!
EDIT: Thanks everyone for your suggestions. I also tried this method myself. I switched all "NA"s to "0" and all other values to "1". Does it make sense?
TestA <-ifelse(TestA2009==1 | TestA2010==1 | TestA2011==1, "TESTa", "NOtesta")
TestB <-ifelse(TestB2010==1 | TestB2011==1, "TESTb", "NOtestb")
TestAonly <-(TestA==TESTa & TestB=="NOtestb")
TestAandTestB <-(TestA==TESTa & TestB=="TESTb")
Upvotes: 0
Views: 186
Reputation: 5274
A reproducible example:
vals1 <- c(NA, "pos", "neg", "nr")
set.seed(1)
df1 <- data.frame(
id = seq(1:10),
a09 = sample(vals1,10,replace=TRUE),
a10 = sample(vals1,10,replace=TRUE),
a11 = sample(vals1,10,replace=TRUE),
b10 = sample(vals1,10,replace=TRUE),
b11 = sample(vals1,10,replace=TRUE)
)
### modify to give at least one case meeting each of your criteria
df1[10,c(5,6)] <- NA # 2x NAs for b's
df1[1,c(2,3,4)] <- NA # 3x NAs for a's
df1[2,c(2,4,5,6)] <- NA # all NAs
giving:
id a09 a10 a11 b10 b11
1 1 <NA> <NA> <NA> pos nr
2 2 <NA> <NA> <NA> <NA> <NA>
3 3 neg neg neg pos nr
4 4 nr pos <NA> <NA> neg
5 5 <NA> nr pos nr neg
6 6 nr pos pos neg nr
7 7 nr neg <NA> nr <NA>
8 8 neg nr pos <NA> pos
9 9 neg pos nr neg neg
10 10 <NA> nr pos <NA> <NA>
Now we chain multiple logical operators to get the id in question. This isn't as elegant as @Carls suggestion above, but may be more intuitive at first glance... Note the grouping brackets i.e. a and (b or c)
:
### test a not b, id=10
df1$id[ is.na(df1$b10) & is.na(df1$b11) &
( !is.na(df1$a09) | !is.na(df1$a10) | !is.na(df1$a11) ) ]
### test b not a, id=1
df1$id[ is.na(df1$a09) & is.na(df1$a10) & is.na(df1$a11) &
& ( !is.na(df1$b10) | !is.na(df1$b11) ) ]
The last example uses the fact that R
will convert TRUE
to 1
when passed to a method expecting numeric. In this case we want to check whether all 5 values in the row are NA
then get the other rows using negation (!
means NOT).
### a and b, id= all except no. 2
df1$id[!rowSums(is.na(df1[ ,2:6]))==5]
Quick intros to logical operators: here and here.
Update:
I'm not sure why you got rid of the NA
s as all the above suggestions do work with them.
First, staying with NA
and following your style of expression:
TestA <-ifelse( !is.na(df1$a09) | !is.na(df1$a10) | !is.na(df1$a11), "TESTa","NOtesta")
TestB <-ifelse( !is.na(df1$b10) | !is.na(df1$b11), "TESTb", "NOtestb")
TestAonly <- (TestA=="TESTa" & TestB=="NOtestb")
TestAandTestB <- (TestA=="TESTa" & TestB=="TESTb")
Note that you need quotes around e.g. Testa
, otherwise R
will try and look for it as a variable rather than a string literal. Also you might consider adopting a simpler naming convention/style for variables e.g. dot.seperator.
The result will be a logical vector of the same length as nrow(df1)
.
If you're sticking with 1
or 0
use something like the following:
TestB <-ifelse( df1$b10==1 | df1$b11==1, "TESTb", "NOtestb" )
Upvotes: 0
Reputation: 21532
Should be pretty much like this. Call your array mydata
, then in very simple steps,
notA <- is.na(mydata[,1])*is.na(mydata[,2])*is.na(mydata[,3])
notB <- is.na(mydata[,4])*is.na(mydata[,5])
AandNotB<- !notA*notB
BandNotA <- notA*!notB
AandB <-!notA*!notB
mydata<-cbind(mydata,AandNotB,BandNotA,AandB)
I'm going on the assumption that any value other than NA
is a positive case.
Upvotes: 1