ToNoY
ToNoY

Reputation: 1378

Applying conditional statements to all elements in a dataframe using R

I have a dataframe that looks like this-

> dpd
         md       mean         sd       fsf        dii   n
2      77.5 0.02827206 0.05761423 0.8382353  29.648895 136
3     120.0 0.07058824 0.04696682 0.5882353   8.333333  17
NA       NA         NA         NA        NA         NA  NA
... ...
NA.8     NA         NA         NA        NA         NA  NA
13    650.0 0.00500000         NA 1.0000000 200.000000   1
NA.9     NA         NA         NA        NA         NA  NA
.. ...
NA.12    NA         NA         NA        NA         NA  NA
18    900.0 0.00500000         NA 1.0000000 200.000000   1

I want to make an if-else statement in such a way that, only if all the 'dii' values are >= 20 and 'fsf' is >= 0.8 in the dataframe, the function will print "GOOD", otherwise "You have a problem!". So I tried something like this-

if (dpd$fsf[!is.na(dpd$fsf)] > 0.8 & dpd$dii[!is.na(dpd$dii)] >= 20)
print("GOOD") else print("You have problem!")

The dataframe clearly shows that, row#3 values disobey both conditions, but R only considers the first element as shown below:

[1] "GOOD"
Warning message:
In if (dpd$fsf[!is.na(dpd$fsf)] > 0.8 & dpd$dii[!is.na(dpd$dii)] >=  :
  the condition has length > 1 and only the first element will be used

How can I improve my conditional statement so that it shows "You have a problem!" Also, is there any way to print the text "GOOD" in a color of my choice?

Upvotes: 2

Views: 2561

Answers (2)

jlhoward
jlhoward

Reputation: 59345

Your situation is a bit more complicated because of the NA values in fsf and dii. You need to use na.rm=T in the call to all(...). Using this for dpd:

dpd
#      id    md       mean         sd       fsf        dii   n
# 1     2  77.5 0.02827206 0.05761423 0.8382353  29.648895 136
# 2     3 120.0 0.07058824 0.04696682 0.5882353   8.333333  17
# 3  <NA>    NA         NA         NA        NA         NA  NA
# 4  NA.8    NA         NA         NA        NA         NA  NA
# 5    13 650.0 0.00500000         NA 1.0000000 200.000000   1
# 6  NA.9    NA         NA         NA        NA         NA  NA
# 7 NA.12    NA         NA         NA        NA         NA  NA
# 8    18 900.0 0.00500000         NA 1.0000000 200.000000   1

with(dpd, if(all(fsf>=0.8 & dii>=20)) print("Good") else print("Problem")) 
# [1] "Problem"  

# remove the "bad" item (2nd row)
dpd.ok <- dpd[-2,]    # should print "Good"
# but it doesn't...
with(dpd.ok, if(all(fsf>=0.8 & dii>=20)) print("Good") else print("Problem"))
# Error in if (all(fsf >= 0.8 & dii >= 20)) print("Good") else print("Problem") : 
#   missing value where TRUE/FALSE needed

# setting na.rm=T fixes it
with(dpd.ok, if(all(fsf>=0.8 & dii>=20,na.rm=T)) print("Good") else print("Problem"))
# [1] "Good"

Upvotes: 0

Mark Heckmann
Mark Heckmann

Reputation: 11431

If you want to check if all logical conditions evaluate to TRUE you should wrap the function all around it. Otherwise you have a logical vector with several elements inside if andif will only use the first element of this vector.

x <- 1:3
y <- 1:3

x > 2 & y < 3
[1] FALSE FALSE FALSE

if (x < 2 & y < 3) print("good")
[1] "good"
Warning message:
In if (x < 2 & y < 3) print("good") :
  the condition has length > 1 and only the first element will be used

Now check if all elements of the logical vector are TRUE

all(x > 2 & y < 3)
[1] FALSE
if (all(x > 2 & y < 3)) print("good")

Upvotes: 3

Related Questions