namore
namore

Reputation: 83

Changing values in an R dataframe if the value for that key in another smaller dataframe meets certain condition

I have two different sized data frames in R.

One, fullRoutes, has play_keys and x and y values that represent movement over time for a play. The other, endOfRoutes, has play_keys and the final x and y values for that play (this one is much smaller because it has one row for each play, whereas the other has rows for every second of the play). For example:

Input

endOfRoutes
key     x
  1    -1
  2    0.5
  3    -2

fullRoutes
key    time     x
  1     0.1   0.2
  1     0.2   0.7
  1     0.3   0.9
  1     0.4   0.5
  1     0.5   -0.2
  1     0.6   -1
  2     0.1   0.2
  2     0.2   0.4
  2     0.3   0.5
  3     0.1   0
  3     0.2   -0.5
  3     0.3   -2

Output

fullRoutes
key    time      x
  1     0.1   -0.2
  1     0.2   -0.7
  1     0.3   -0.9
  1     0.4   -0.5
  1     0.5   0.2
  1     0.6   1
  2     0.1   0.2
  2     0.2   0.4
  2     0.3   0.5
  3     0.1   0
  3     0.2   0.5
  3     0.3   2

Notice, all values where the play ended in negative, would be multiplied by -1.

If the x value in endOfRoutes is negative for the key in fullRoutes, I want to change all x values in fullRoutes with that key to -x. My code to do this is:

cleanOne = mutate(fullRoutes, x = ifelse(endOfRoutes[endOfRoutes$play_key == play_key]$x < 0, -x, x))

However, I am getting the error

longer object length is not a multiple of shorter object lengthlonger object length is not a multiple of shorter object lengthError in mutate_impl(.data, dots) : Column x must be length 55 (the number of rows) or one, not 0.

--General Question--

Are you able to conditionally change the values in a larger data frame based on the key values in a smaller data frame? I had seen other posts where this works for dataframes of the same size, but not ones where multiple rows in one dataframe are accessing the same row in another.

Upvotes: 0

Views: 165

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389325

An idea similar to @camille in base R would be to do a left join on fullRoutes with endOfRoutes and then change the value of x based on sign.

transform(merge(fullRoutes, endOfRoutes, all.x = TRUE, by = 'key'), 
                 x = x.x * sign(x.y))[names(fullRoutes)]

#   key time    x
#1    1  0.1 -0.2
#2    1  0.2 -0.7
#3    1  0.3 -0.9
#4    1  0.4 -0.5
#5    1  0.5  0.2
#6    1  0.6  1.0
#7    2  0.1  0.2
#8    2  0.2  0.4
#9    2  0.3  0.5
#10   3  0.1  0.0
#11   3  0.2  0.5
#12   3  0.3  2.0

Upvotes: 1

akrun
akrun

Reputation: 887961

We could use a join

library(data.table) # version 1.12.8
setDT(fullRoutes)[endOfRoutes, x := x * c(1, -1)[(i.x < 0) + 1], on = .(key)]
fullRoutes
#    key time    x
# 1:   1  0.1 -0.2
# 2:   1  0.2 -0.7
# 3:   1  0.3 -0.9
# 4:   1  0.4 -0.5
# 5:   1  0.5  0.2
# 6:   1  0.6  1.0
# 7:   2  0.1  0.2
# 8:   2  0.2  0.4
# 9:   2  0.3  0.5
#10:   3  0.1  0.0
#11:   3  0.2  0.5
#12:   3  0.3  2.0

Or make use of sign

setDT(fullRoutes)[endOfRoutes, x := x * sign(i.x), on = .(key)]

Or another option is fcase from the devel version of data.table (1.12.9)

setDT(fullRoutes)[endOfRoutes, on = .(key), x := fcase(i.x <= 0, -x, i.x > 0, x)]

Or using base R

fullRoutes$x <- with(fullRoutes, x * sign(endOfRoutes$x[key]))

data

fullRoutes <- structure(list(key = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), time = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.1, 0.2, 0.3, 
0.1, 0.2, 0.3), x = c(0.2, 0.7, 0.9, 0.5, -0.2, -1, 0.2, 0.4, 
0.5, 0, -0.5, -2)), class = "data.frame", row.names = c(NA, -12L
))

endOfRoutes <-structure(list(key = 1:3, x = c(-1, 0.5, -2)), 
class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

camille
camille

Reputation: 16881

This is a good use-case for joining, because the number of rows won't matter. Join by key, using a left-join to make sure you aren't losing observations in fullRoutes. I'm renaming the endOfRoute's x column to be distinct from the other data frame's x. Convert the multiplier to -1 or 1, multiply, and drop the multiplier column.

library(dplyr)

fullRoutes %>%
  left_join(endOfRoutes %>% rename(multiplier = x), 
            by = "key") %>%
  mutate(multiplier = sign(multiplier),
         x = x * multiplier) %>%
  select(-multiplier)
#> # A tibble: 12 x 3
#>      key  time     x
#>    <dbl> <dbl> <dbl>
#>  1     1   0.1  -0.2
#>  2     1   0.2  -0.7
#>  3     1   0.3  -0.9
#>  4     1   0.4  -0.5
#>  5     1   0.5   0.2
#>  6     1   0.6   1  
#>  7     2   0.1   0.2
#>  8     2   0.2   0.4
#>  9     2   0.3   0.5
#> 10     3   0.1   0  
#> 11     3   0.2   0.5
#> 12     3   0.3   2

Upvotes: 2

Related Questions