Chandan Kumar
Chandan Kumar

Reputation: 47

How to apply a function row-wise having if-else condition using data.table?

I need to apply a function on a data frame below. 'day' is sales values. I need to shift the sales to right based on offset value. For example in row 1 offset is 1, I need to shift sales by 1 day, if 0 then no shift and henceforth.

id <- c('a', 'b', 'c', 'd', 'e', 'f')  
offset <- c(1,2,3,0,0,2)
day1 <-   c(1,2,3,4,5, 0)
day2 <-   c(1,2,3,4,5, 2)
day3 <-   c(1,2,3,4,5, 6)
day4 <-   c(1,2,3,4,5, 6)
day5 <-   c(1,2,0,4,5, 0)
day6 <-   c(1,0,0,0,5, 0)
day7 <-   c(0,0,0,0,0, 0)
df <- data.frame(id, offset, day1, day2, day3, day4, day5, day6, day7) 

> df
id offset day1 day2 day3 day4 day5 day6 day7
a      1    1    1    1    1    1    1    0
b      2    2    2    2    2    2    0    0
c      3    3    3    3    3    0    0    0
d      0    4    4    4    4    4    0    0
e      0    5    5    5    5    5    5    0
f      2    0    2    6    6    0    0    0

The result should be like below:

> result
id offset day1 day2 day3 day4 day5 day6 day7
a      1    0    1    1    1    1    1    1
b      2    0    0    2    2    2    2    2
c      3    0    0    0    3    3    3    3
d      0    4    4    4    4    4    0    0
e      0    5    5    5    5    5    5    0
f      2    0    0    0    2    6    6    0

I intended to use following pseudo function row-wise in data.table:

shiftSales = function(df){
 if (start > 0) 
 { 
  then_no_shift 
 }
 else
 { 
  offset_by_offset_value
 }
   return(shift_df)  
}
result <- df(,shiftSales(df), by = "id")

Note: if it is possible without data.table, I am ok. But my data is large so I thought data.table approach would be faster.

Upvotes: 2

Views: 1076

Answers (3)

Uwe
Uwe

Reputation: 42544

The OP has requested to shift data row-wise across columns by an offset which is specific for each row. (Unfortunately, the title mentioning if-else is somewhat misleading).

All solutions posted so far, are using the t() function (matrix transpose) which indicates that the way the data are stored is not particularly well suited for this type of operation.

The solution below uses melt() to reshape the data from wide to long form before applying a shift() operation on columns:

library(data.table)
# reshape from wide to long
melt(setDT(df), measure.vars = patterns("^day"))[
  # shift values for each id by its individual offset
  , value := shift(value, offset, fill = 0), by = id][
    # reshape to wide format agian for comparison
    , dcast(.SD, id + offset ~ variable)]
   id offset day1 day2 day3 day4 day5 day6 day7
1:  a      1    0    1    1    1    1    1    1
2:  b      2    0    0    2    2    2    2    2
3:  c      3    0    0    0    3    3    3    3
4:  d      0    4    4    4    4    4    0    0
5:  e      0    5    5    5    5    5    5    0
6:  f      2    0    0    0    2    6    6    0

Caveat: It is assumed that id is unique. Otherwise, an additional row number needs to be introduced.


As already mentioned, I suggest to reconsider the way the data is stored.

Currently, longitudinal data, i.e., day1, day2, day3, ..., are stored in a data.frame in wide format, i.e., in separate columns. This is not ideal as it requires row-wise operations to be performed across columns.

Instead, the data could be stored as a matrix (with id as row names and offset stored in a separate vector). Or, if there are other, undisclosed columns, in a data.frame in long format.

Upvotes: 2

Joshua Daly
Joshua Daly

Reputation: 626

Here is a solution using data.table:

# read df
df = read.table(
  header = TRUE,
  text = 
    "
    id offset day1 day2 day3 day4 day5 day6 day7
    a      1    1    1    1    1    1    1    0
    b      2    2    2    2    2    2    0    0
    c      3    3    3    3    3    0    0    0
    d      0    4    4    4    4    4    0    0
    e      0    5    5    5    5    5    5    0
    f      2    0    2    6    6    0    0    0
    "
);

# load binhf (for shift function)
library(binhf);

# convert to data table
dt = setDT(df)[
  ,
  # establish the new columns using mapply
  c("day_1", "day_2", "day_3", "day_4", "day_5", "day_6", "day_7") :=
    data.table(
      t(
        x = mapply(
          FUN = function(offset, day_1, day_2, day_3, day_4, day_5, day_6, day_7){
            # make a vector of day_1, day_2, day_3,...,day_7
            vec = c(day_1, day_2, day_3, day_4, day_5, day_6, day_7);

            # shift to the right
            vec_s = shift(v = vec, places = offset, dir = "right");

            # return vec_s
            vec_s;
          },
          # parse vectors
          day_1 = day_1,
          day_2 = day_2,
          day_3 = day_3,
          day_4 = day_4,
          day_5 = day_5,
          day_6 = day_6,
          day_7 = day_7,
          offset = offset
        )
      )
    )
  ][
  # remove unwanted (unshifted) columns
    ,
    -(2:9),
    with = FALSE
  ];

# print dt
dt;

   id day_1 day_2 day_3 day_4 day_5 day_6 day_7
1:  a     0     1     1     1     1     1     1
2:  b     0     0     2     2     2     2     2
3:  c     0     0     0     3     3     3     3
4:  d     4     4     4     4     4     0     0
5:  e     5     5     5     5     5     5     0
6:  f     0     0     0     2     6     6     0

I hope this helps!

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76402

There's no need for complicated if/else. Try the following.

df <- data.frame(id, offset, day1, day2, day3, day4, day5, day6, day7)

df[-(1:2)] <- t(apply(df[-1], 1, function(x) c(rep(0, x[1]), x[2:8])[1:7]))
df

Upvotes: 1

Related Questions