llrs
llrs

Reputation: 3397

Replacing in data frame by character

If I replace some values with a character by data[data<0] <- "Down" and after that by : data[data>0] <- "Up" I get all the values to "Up", but if I inverse the substitution I get it work the way I like.

data<-runif(30, min=-5, max=5)
data[data<0]<-"Down"
data[data>0]<-"Up"
#[1] "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up"
#[16] "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up" "up"

but if I do this I get a correct result:

data<-runif(30, min=-5, max=5)
data[data>0]<-"Up"
data[data<0]<-"Down"
#[1] "down" "up"   "up"   "down" "down" "down" "down" "down" "down" "down"
#[11] "down" "down" "up"   "down" "down" "down" "up"   "up"   "down" "up"  
#[21] "up"   "down" "up"   "up"   "down" "down" "up"   "up"   "down" "down"

The solution is simple, do the second one, but I am curious why this happens. At first I thought it was something involved with the conversion to character but, it can not be that because then changing the sequence will not affect or it would affect the same way. Any idea?

Upvotes: 2

Views: 122

Answers (2)

agstudy
agstudy

Reputation: 121568

The solution is to avoid conversion to character, and do the 2 tests simultaneously:

ifelse(data>0,"Up","Down")

EDIT

To explain what's happen , strangely R use the sign "-" to check if a character is negative or positive.

"-a" < 0
[1] TRUE

So when you do in this order (> then <) you keep the sign and it works.

EDIT "-" is ordered before "0" in ASCII . Here a function that can accept an ascii character and return the decimal value:

asc <- function(x) { strtoi(charToRaw(x),16L) }
> asc("0")
[1] 48
> asc("-")
[1] 45

Upvotes: 1

Mark Graph
Mark Graph

Reputation: 5121

A vector in R can only hold values of one type. The vector was originally created with numeric values.

In your first bit of code (data[data<0]<-"Down") your assignment converted the vector mode from numeric to character. The remaining numbers in the vector were changed from numeric mode to character mode. At the end of that assignment, the vector looks something like ...

 [1] "Down"              "4.50482521206141"  "Down"             
 [4] "Down"              "Down"              "Down"             
 [7] "3.81024733651429"  "1.01603321265429"  "Down"             
[10] "3.30486473860219"  "4.82019837480038"  "1.3452106853947"  
[13] "2.02783531043679"  "Down"              "Down"             
[16] "Down"              "Down"              "Down"             
[19] "4.59091864991933"  "Down"              "2.09894138388336" 
[22] "Down"              "0.638334625400603" "1.58013242762536" 
[25] "2.14735288871452"  "Down"              "1.67530178790912" 
[28] "4.91423513041809"  "Down"              "1.71986542874947" 

When it comes to the second comparison - an implicit type conversion occurs. R wont let you compare numbers and strings. So the number 0 is converted to the string "0". (ie. data[data>0] was coerced to data[data>"0"])

Which is why it did not work as you thought it should. (All of the strings those with the value "down" and those with a number as a string all test TRUE to being greater than the string "0".)

In the second example, the numeric strings begin with a "-", which in character encoding is less than the character encoding for "0".

Upvotes: 2

Related Questions