EGM8686
EGM8686

Reputation: 1572

Substract constant value based on position

I have a DataFrame that has information stored like this:

id  store   v1  v2  v3  v4  v5  pos
1   A       5   5   7   7   7   3
2   B       1   1   1   4   5   4

I would like to substract values based on the variable position, for example for id =1 the position is 3 so I would like to substract from v3, v4 and v5 a constant equal as the difference between V3 and v2 which is 2 (7-5). So the resulting dataframe should look like this:

id  store   v1  v2  v3  v4  v5  pos
1   A       5   5   5   5   5   3
2   B       1   1   1   1   2   4

For the second row the position was 4 so (V4-V3 = 3) so we're substracting 3 from the variable located at position 4 and 5.

Thx!

Upvotes: 2

Views: 77

Answers (4)

IceCreamToucan
IceCreamToucan

Reputation: 28695

#select v columns (v1, v2, ..., v5)
vs <- df[grep('^v', names(df))]
# compute differences (in this case, the vector c(2, 3))
diffs <- vs[cbind(1:nrow(df), df$pos)] - vs[cbind(1:nrow(df), df$pos - 1)]
# subtract diffs from vs if the column is >= pos
df[grep('^v', names(df))] <- vs - diffs*(col(vs) >= df$pos)


df
#   id store v1 v2 v3 v4 v5 pos
# 1  1     A  5  5  5  5  5   3
# 2  2     B  1  1  1  1  2   4

Data used:

df <- read.table(text = '
id  store   v1  v2  v3  v4  v5  pos
1   A       5   5   7   7   7   3
2   B       1   1   1   4   5   4
', header = T)

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

A different tidyverse possibility could be:

df %>%
 gather(var, val, -c(id, pos, store)) %>%
 arrange(id, var) %>%
 group_by(id) %>%
 mutate(temp = cumsum(ifelse(parse_number(var) == pos, 1, 0) == 1),
        val = ifelse(temp == 1, 
                       val - (val[min(which(temp == 1))] - val[max(which(temp == 0))]), val)) %>%
 select(-temp) %>%
 spread(var, val)

     id store   pos    v1    v2    v3    v4    v5
  <int> <chr> <int> <int> <int> <int> <int> <int>
1     1 A         3     5     5     5     5     5
2     2 B         4     1     1     1     1     2

It, first, transforms the data from wide to long format, excluding the variables "id", "pos" and "store". Second, it arranges the data according "id" and "var" (which is the key) and groups by "id". Third, it checks (using variable "temp") whether the number in key (i.e. the number in variable names "v1" to "v5") is equal to the number in "pos". If so, it assigns 1 and then performs a cumulative sum around 1, thus assigning also all subsequent rows with value 1. Forth, if the value in "temp" is 1, it subtracts the value in the last row with 0 from the value in the first row with 1, and then subtract this from all rows with 1. Finally, it returns the data to its original shape.

Upvotes: 0

bouncyball
bouncyball

Reputation: 10761

Takes a few lines, but this is possible using functions from the tidyverse. The biggest key is to take the wide data and make it long (this is gather's job). Then we find where pos == var_pos, calculate the appropriate difference, and subtract this from the appropriate values. The separate and unite functions let us convert v1 -> v and 1 and then back again.

library(tidyverse)
dat %>%
    gather(variable, value, starts_with('v')) %>% # wide to long
    separate(variable, c('variable', 'var_pos'), sep = 1) %>% # v1 -> v, 1
    group_by(id) %>%
    mutate(var_diff = value[var_pos == pos] - value[var_pos == (pos - 1)]) %>%
    mutate(value = ifelse(var_pos >= pos, value - var_diff, value)) %>% # subtract
    unite(variable, variable, var_pos, sep = '') %>% # v,1 -> v1
    select(-var_diff) %>%
    spread(variable, value) # long to wide

    id store   pos    v1    v2    v3    v4    v5
  <int> <chr> <int> <int> <int> <int> <int> <int>
1     1 A         3     5     5     5     5     5
2     2 B         4     1     1     1     1     2

Upvotes: 2

Anders Ellern Bilgrau
Anders Ellern Bilgrau

Reputation: 10223

If you data frame is not too big, a simple for-loop can also do the trick in base R easily:

# Load your data frame
df <- read.table(header = TRUE, text = "
id  store   v1  v2  v3  v4  v5  pos
1   A       5   5   7   7   7   3
2   B       1   1   1   4   5   4")

# Run through all rows
for (i in seq_len(nrow(df))) {
  p <- df$pos[i]  # Get position
  dif <- df[i, paste0("v", p)]  - df[i, paste0("v", p - 1)]  # Compute difference
  cols <- paste0("v", seq(p, 5))  # Construct colnames to subtract dif from
  df[i, cols] <- df[i, cols] - dif  # Do the subtraction
}
print(df)
#  id store v1 v2 v3 v4 v5 pos
#1  1     A  5  5  5  5  5   3
#2  2     B  1  1  1  1  2   4

Of course this code relies on some assumptions that your actual real life data looks quite similar to what you have here. And if not, I guess the code is easily adopted.

If you hate for-loops in R you can wrap it into a function and hide it with apply.

Upvotes: 1

Related Questions