user1491868
user1491868

Reputation: 326

R within() order of operation and logic

I am trying to understand how the within() function in R "works." For example, in the code below I try to make a new variable named "FEELS" based on a condition. The first two uses of the within() function do not work. The third use of the within() function works, but I am not confident I understand the logic of "why" it works. Any help is appreciated.

DF <- data.frame(DATE = seq(as.Date("2015-01-01"), as.Date("2015-12-31"), "month"), TEMP = c(30, 40, 50, 60, 70, 80, 90, 100, 90, 80, 70, 60))

DF <- within(DF, {
  FEELS[30 <= TEMP & TEMP <=  50] <- "Cold"
  FEELS[60 <= TEMP & TEMP <=  70] <- "Good"
  FEELS[80 <= TEMP & TEMP <= 100] <- "Hot"
})

DF <- within(DF, {
  FEELS                           <- "Cold"
  FEELS[60 <= TEMP & TEMP <=  70] <- "Good"
  FEELS[80 <= TEMP & TEMP <= 100] <- "Hot"
})

DF

DF <- within(DF, {
  FEELS                           <- NA
  FEELS[60 <= TEMP & TEMP <=  70] <- "Good"
  FEELS[80 <= TEMP & TEMP <= 100] <- "Hot"
  FEELS[is.na(FEELS)]             <- "Cold"
})

DF

Upvotes: 2

Views: 102

Answers (2)

Frank
Frank

Reputation: 66819

When you create an object inside within(DF, {...}), it does not automatically have the same length as columns of DF. Instead, it will be "recycled" at the end of {...} to fill out the column

within(data.frame(A=1:6), { B = 1; C = 1:2 })
#   A C B
# 1 1 1 1
# 2 2 2 1
# 3 3 1 1
# 4 4 2 1
# 5 5 1 1
# 6 6 2 1

If, before the end of {...}, you want to modify an object as if it were a full column, it must have the correct length:

within(data.frame(A=1:6), {
  D = 1 
  D[ A < 3 ] = 0
  D2 = rep(1, length(A))
  D2[A < 3 ] = 0
})

#   A D2  D
# 1 1  0  0
# 2 2  0  0
# 3 3  1 NA
# 4 4  1 NA
# 5 5  1 NA
# 6 6  1 NA

To understand why D2 gave the expected output while D did not, try examining the objects in steps, using browser() as suggested by @sebastian-c or following the steps as illustrated in his answer.

In the OP's case, initializing with rep and then making several substitutions is one option. Another would be to use cut, which is designed for assigning labels to intervals of ordered data.

Upvotes: 4

sebastian-c
sebastian-c

Reputation: 15415

Let's break these down one by one.

1. This one simply results in an error message:

Error in FEELS[30 <= TEMP & TEMP <= 50] <- "Cold" : object 'FEELS' not found

That makes perfect sense. You haven't yet defined FEELS, so subsetting it results in an error.

2. This one's interesting and can be seen more clearly if you do it outside of 'within'

FEELS <- "cold"
tf <- 60 <= DF$TEMP & DF$TEMP <=  70
tf

[1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

FEELS[tf] <- "Good"
FEELS

 [1] "cold" NA     NA     "Good" "Good" NA     NA     NA     NA     NA     "Good"
[12] "Good"

R starts with a vector of length one containing "cold", but your subsetting forces it to extend and place "Good" in all elements where it's TRUE. R doesn't have any values for everything that's FALSE, so puts NA there.

3. The last one is pretty straightforward. You start with an NA vector which is extended in the same way as the one in 2. You then replace all the NAs which are left with "cold".

Upvotes: 4

Related Questions