statsguyz
statsguyz

Reputation: 459

Conditional If statement over large data frame

I'm looking for the most straightforward way to retrieve information from data frame in R. The data frame contains several dates, Day 0, Day 1, Day 2, Day 3, Day 4, Day 5, Day 6, Day 7, and Day 8. The events are listed on a specific date, and we are interested in finding events that occurred between any two consecutive days, as well as between dates where a null entry exists (e.e. in the table below this would include between Day 3 and Day 5 in row 1).

    Person  day0 day1 day2 day3 day4 day5 day6  day7 events
     1      10   12   14   18   NA   22   32   50     20
     2      11   15   19   NA   NA   NA   50   67     35
     3      12   18   21   26   33   42   50   NA     45
     4      15   24   32   NA   43   NA   54   76     40

The full data set has several thousand people.

I attempted to check between the first two days and write the event to a vector:

for(i in 1:length(days$Person)){
if(days$event[i] != NA){
if(days$day0[i] != NA){
if(days$day1[i] != NA){

 if(days$day0[i] < days$events[i] & days$day1[i] > days$events[i]){
     vector[i]<-events[i]
}
}
}

However, I continue to get errors.

Error in if (days$day1[i] != NA) { : missing value where TRUE/FALSE needed

Any help would be much appreciated.

Upvotes: 0

Views: 262

Answers (1)

Artem
Artem

Reputation: 3414

  • It is better to use data.frame subsetting than for loop and nested if;
  • I added an observation into data.frame which meets your filter criteria, otherwise the output of your example is empty;
  • If you add NA to any number the result is NA, !is.na(events + day0 + day1) is a shortened version of three nested if.
  • You should use function is.na for NA check, since e.g. 10 != NA returns NA.
  • if-condition throws an error you mentiond, if you provide it with NA.
  • It is better to use dput(head(your_data.frame)) to provide an example of your input data as well as desired output, it will facilitate to get help from the community.

Please see the code below:

days <- structure(list(Person = 1:5, day0 = c(10L, 11L, 12L, 15L, 1L), 
    day1 = c(12L, 15L, 18L, 24L, 20L), day2 = c(14L, 19L, 21L, 
    32L, 3L), day3 = c(18L, NA, 26L, NA, 4L), day4 = c(NA, NA, 
    33L, 43L, 5L), day5 = c(22L, NA, 42L, NA, 6L), day6 = c(32L, 
    50L, 50L, 54L, 7L), day7 = c(50L, 67L, NA, 76L, 8L), events = c(20L, 
    35L, 45L, 40L, 10L)), class = "data.frame", row.names = c(NA, 
-5L))
vector <- subset(days, !is.na(events + day0 + day1) & day0 < events & day1 > events)[["events"]]
vector

Output is a vector of numbers of events meeting your criteria:

# [1] 10

Upvotes: 1

Related Questions