Sylababa
Sylababa

Reputation: 65

Change multiple values with ifelse in tidyverse

I hav a dataset with 20.000 Observations and 5 Variables. Now I want to change in some specific observations only one variable. I know that I can do this for every row like this:

test_data <- test_data%>%
  mutate(change_variable=ifelse(n=="1000","changevalue",changevariable))

My problem is now that I need to change 500 Obersvations like this. Is there any possibility to automate this process instead of writing a code of 500 lines? It is every time the same variable to get changed and I have the right value for this variable in a dataframe connected to the right "n" value.

I Hope someone of you can help me with this.

Kind Regards, Tom

Upvotes: 1

Views: 595

Answers (3)

SteveM
SteveM

Reputation: 2301

You can reference a test vector inside of a base R ifelse statement. Each test will use the row index number of the test vector. E.g.

Generate a test vector for cars$cyl (cars = mtcars) and test it against each cars$cyl entry. Assign the test result to cars$test to check.

cars <- mtcars
testvec <- sample(c(4, 6, 8), 32, replace = TRUE)
cars$test <- ifelse(cars$cyl == testvec, 'match', 'no match')
cars <- cbind(cars, testvec)
head(cars, 10)
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb     test testvec
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 no match       8
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 no match       8
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    match       4
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 no match       4
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 no match       4
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 no match       4
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4    match       8
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 no match       6
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 no match       8
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 no match       8

Upvotes: 0

r2evans
r2evans

Reputation: 160417

I think this could be a "join" (merge) operation.

library(dplyr)
set.seed(2)
mt <- sample_n(mtcars, 6)
mt
#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Toyota Corona      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
# Valiant            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Ferrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
# Merc 240D          24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Chrysler Imperial  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
newgears <- data.frame(gear = c(2, 3, 4), newgear = c(22, 33, 44))
newgears
#   gear newgear
# 1    2      22
# 2    3      33
# 3    4      44

The premise is that you have one frame that has a mapping from the original values (gear) to a new value (newgear). Not all existing gear values need to be present in newgears (we handle that), nor is there a problem if there are extra gear values in this new frame, as they will be ignored.

With this,

left_join(mt, newgears, by = "gear")
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb newgear
# 1 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1      33
# 2 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4      33
# 3 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1      33
# 4 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6      NA
# 5 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2      44
# 6 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4      33

From this, notice that we have one gear value that was not mapped to a newgear. This can be expected and normal, we just need to account for it. In our case, we will coalesce first newgear then gear; what this does is use newgear unless it is NA, in which case use gear instead.

left_join(mt, newgears, by = "gear") %>%
  mutate(gear = coalesce(newgear, gear)) %>%
  select(-newgear)
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# 1 21.5   4 120.1  97 3.70 2.465 20.01  1  0   33    1
# 2 10.4   8 472.0 205 2.93 5.250 17.98  0  0   33    4
# 3 18.1   6 225.0 105 2.76 3.460 20.22  1  0   33    1
# 4 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
# 5 24.4   4 146.7  62 3.69 3.190 20.00  1  0   44    2
# 6 14.7   8 440.0 230 3.23 5.345 17.42  0  0   33    4

I believe the use of a mapping frame (newgears here) is easier to maintain and visualize, not to mention code and use in a multitude of ways and places.

Upvotes: 1

akrun
akrun

Reputation: 887048

If we need to change only for specific observation, create the logical expression with row_number() and %in%. If the "changevalue" are specific for first 500 observation, create it as a column

library(dplyr)
test_data$changevalue[1:500] <- vector_of_values
test_data <- test_data %>%
   mutate(change_variable = ifelse(
         row_number() %in% 1:500, changevalue, changevariable))

Or this can be done with coalesce as well

test_data %>%
    mutate(change_variable = coalesce(changevalue, changevariable))

Or can use between

test_data %>%
   mutate(change_variable = ifelse(between(row_number(), 1, 500),
        changevalue, changevariable))

Upvotes: 0

Related Questions