Daniel
Daniel

Reputation: 1272

How to do sequential numeric/integer replacement with string in a data.frame in R?

Through the process of learning R and following up with my previous question and the answer, I am trying to figure it out how to do the sequential string replace in a data.frame in R.

Considering the mtcars dataset, i'd like to define values for the mtcars$hp as (hp <100, hp >= 100 & hp <200, hp >200) to be labeled as ("low", "medium" and "high"), respectively.

Theoretically, the following sequentially functions should do the job :

  1-  mtcars$hp[mtcars$hp <100] = "low"
   2-  mtcars$hp[mtcars$hp >=100 & mtcars$hp <200] = "medium"
    3- mtcars$hp[mtcars$hp >= 200] = "high"

Running functions 1 and 2 everything goes well.

> head(mtcars, 20)
                     mpg cyl  disp     hp drat    wt  qsec vs am gear carb newcol
Mazda RX4           21.0   6 160.0 medium 3.90 2.620 16.46  0  1    4    4    low
Mazda RX4 Wag       21.0   6 160.0 medium 3.90 2.875 17.02  0  1    4    4    low
Datsun 710          22.8   4 108.0    low 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive      21.4   6 258.0 medium 3.08 3.215 19.44  1  0    3    1    low
Hornet Sportabout   18.7   8 360.0 medium 3.15 3.440 17.02  0  0    3    2 medium
Valiant             18.1   6 225.0 medium 2.76 3.460 20.22  1  0    3    1    low
Duster 360          14.3   8 360.0    245 3.21 3.570 15.84  0  0    3    4 medium
Merc 240D           24.4   4 146.7    low 3.69 3.190 20.00  1  0    4    2   high
Merc 230            22.8   4 140.8    low 3.92 3.150 22.90  1  0    4    2   high
Merc 280            19.2   6 167.6 medium 3.92 3.440 18.30  1  0    4    4    low
Merc 280C           17.8   6 167.6 medium 3.92 3.440 18.90  1  0    4    4    low
Merc 450SE          16.4   8 275.8 medium 3.07 4.070 17.40  0  0    3    3 medium
Merc 450SL          17.3   8 275.8 medium 3.07 3.730 17.60  0  0    3    3 medium
Merc 450SLC         15.2   8 275.8 medium 3.07 3.780 18.00  0  0    3    3 medium
Cadillac Fleetwood  10.4   8 472.0    205 2.93 5.250 17.98  0  0    3    4 medium
Lincoln Continental 10.4   8 460.0    215 3.00 5.424 17.82  0  0    3    4 medium
Chrysler Imperial   14.7   8 440.0    230 3.23 5.345 17.42  0  0    3    4 medium
Fiat 128            32.4   4  78.7    low 4.08 2.200 19.47  1  1    4    1   high
Honda Civic         30.4   4  75.7    low 4.93 1.615 18.52  1  1    4    2   high
Toyota Corolla      33.9   4  71.1    low 4.22 1.835 19.90  1  1    4    1   high

However, as soon as I run function 3 mtcars$hp[mtcars$hp >= 200] = "high" following the last two option all of the hp will turn into "high"!

> mtcars$hp[mtcars$hp >= 200] = "high"
> head(mtcars, 20)
                     mpg cyl  disp   hp drat    wt  qsec vs am gear carb newcol
Mazda RX4           21.0   6 160.0 high 3.90 2.620 16.46  0  1    4    4    low
Mazda RX4 Wag       21.0   6 160.0 high 3.90 2.875 17.02  0  1    4    4    low
Datsun 710          22.8   4 108.0 high 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive      21.4   6 258.0 high 3.08 3.215 19.44  1  0    3    1    low
Hornet Sportabout   18.7   8 360.0 high 3.15 3.440 17.02  0  0    3    2 medium
Valiant             18.1   6 225.0 high 2.76 3.460 20.22  1  0    3    1    low
Duster 360          14.3   8 360.0 high 3.21 3.570 15.84  0  0    3    4 medium
Merc 240D           24.4   4 146.7 high 3.69 3.190 20.00  1  0    4    2   high
Merc 230            22.8   4 140.8 high 3.92 3.150 22.90  1  0    4    2   high
Merc 280            19.2   6 167.6 high 3.92 3.440 18.30  1  0    4    4    low
Merc 280C           17.8   6 167.6 high 3.92 3.440 18.90  1  0    4    4    low
Merc 450SE          16.4   8 275.8 high 3.07 4.070 17.40  0  0    3    3 medium
Merc 450SL          17.3   8 275.8 high 3.07 3.730 17.60  0  0    3    3 medium
Merc 450SLC         15.2   8 275.8 high 3.07 3.780 18.00  0  0    3    3 medium
Cadillac Fleetwood  10.4   8 472.0 high 2.93 5.250 17.98  0  0    3    4 medium
Lincoln Continental 10.4   8 460.0 high 3.00 5.424 17.82  0  0    3    4 medium
Chrysler Imperial   14.7   8 440.0 high 3.23 5.345 17.42  0  0    3    4 medium
Fiat 128            32.4   4  78.7 high 4.08 2.200 19.47  1  1    4    1   high
Honda Civic         30.4   4  75.7 high 4.93 1.615 18.52  1  1    4    2   high
Toyota Corolla      33.9   4  71.1 high 4.22 1.835 19.90  1  1    4    1   high 

Any idea why and what I am doing wrong?
Thanks!

Upvotes: 1

Views: 52

Answers (1)

A. Webb
A. Webb

Reputation: 26446

You should use a temporary vector

res <- character(length(mtcars$hp))
res[mtcars$hp <100] <- "low"
res[mtcars$hp >=100 & mtcars$hp <200] <- "medium"
res[mtcars$hp >= 200] <- "high"
mtcars$hp <- res

Otherwise, you will have altered the basis of comparison with the first assignment

df <- mtcars
class(df$hp)
#> [1] "numeric"
df$hp[df$hp <100] <- "low"
class(df$hp)
#> [1] "character"

and subsequent comparisons will all be string based, not numeric!

Alternatively, you can do this all at once with the right tool, cut

cut(mtcars$hp,c(-Inf,100,200,Inf),c("low","medium","high"),right=FALSE)

Upvotes: 1

Related Questions