user8810618
user8810618

Reputation: 115

How create binary variable by comparison between row columns

I have this column of months in my data table that show the consecutive intervals of months.

> data[,"PromoInterval"]
          PromoInterval
     1: Jan,Apr,Jul,Oct
     2: Jan,Apr,Jul,Oct
     3: Jan,Apr,Jul,Oct
     4: Jan,Apr,Jul,Oct
     5: Jan,Apr,Jul,Oct
    ---                
324322: Jan,Apr,Jul,Oct
324323: Jan,Apr,Jul,Oct
324324: Jan,Apr,Jul,Oct
324325: Jan,Apr,Jul,Oct
324326: Jan,Apr,Jul,Oct

Then I want to check if the date given below is in the corresponding interval assuming that each row of all the base was given in the same year.

The date column is:

> data[,"Date"]
              Date
     1: 2015-07-31
     2: 2015-07-30
     3: 2015-07-29
     4: 2015-07-28
     5: 2015-07-27
    ---           
324322: 2013-01-05
324323: 2013-01-04
324324: 2013-01-03
324325: 2013-01-02
324326: 2013-01-01

For example I need to know if the date 2015-07-31 of the first row is in the interval Jan,Apr,Jul,Oct given by the first row of the PromoInterval variable.

So I create a new variable result that convert the month given by the Date variable to his real noun:

data[,resultat:=as.character(month(ymd(010101) + months((data[,DateMonth])-1),label=TRUE,abbr=TRUE))]

> data[,"resultat"]
        resultat
     1:  juil\\.
     2:  juil\\.
     3:  juil\\.
     4:  juil\\.
     5:  juil\\.
    ---         
324322:  janv\\.
324323:  janv\\.
324324:  janv\\.
324325:  janv\\.
324326:  janv\\.

But I don't know why the format was like this above ?

Then I create a list that make PromoInterval as list type for each row:

data[,list:=strsplit((data[,PromoInterval]),split=',',fixed=TRUE)]

Then I compare if the month's name given by the resultat column does exist in the variable list. For example if the juil of the first row of resultat is in Jan,Apr,Jul,Oct given by the first row of PromoInterval.

So I create this binairy variable :

data[,Promoinsales:=if(resultat %in% list) {1} else {0}]

But the result was all null and not correct because the first row should be 1 not 0!(7->Jul exist in the list Jan,Apr,Jul,Oct)

> data[,"Promoinsales"]
        Promoinsales
     1:            0
     2:            0
     3:            0
     4:            0
     5:            0
    ---             
324322:            0
324323:            0
324324:            0
324325:            0
324326:            0

How do I resolve this please? thank you in advance!

Upvotes: 0

Views: 55

Answers (2)

Onyambu
Onyambu

Reputation: 79208

 dat[,promoinSales:=as.numeric(grepl(month.abb[month(Date)],PromoInterval)),by=1:nrow(dat)][] 
   nrow       Date   PromoInterval promoinSales
1:    1 2015-06-27 Jan,Apr,Jul,Oct            0
2:    2 2015-05-27 Jan,Apr,Jul,Oct            0
3:    3 2015-04-27 Jan,Apr,Jul,Oct            1
4:    4 2015-01-27 Jan,Apr,Jul,Oct            1
5:    5 2015-10-27 Jan,Apr,Jul,Oct            1
6:    6 2015-12-27 Jan,Apr,Jul,Oct            0

dat[,promoinSales:=as.numeric(grepl(format(as.Date(Date),"%b"),PromoInterval)),by=1:nrow(dat)][] 

Data:

dat=fread("     Date PromoInterval
      2015-06-27 Jan,Apr,Jul,Oct
      2015-05-27 Jan,Apr,Jul,Oct
      2015-04-27 Jan,Apr,Jul,Oct
      2015-01-27 Jan,Apr,Jul,Oct
      2015-10-27 Jan,Apr,Jul,Oct
     2015-12-27 Jan,Apr,Jul,Oct
      ",sep=" ")

Upvotes: 1

YOLO
YOLO

Reputation: 21709

This should give you some headstart to solve this.

## this function checks if month name is present in PI
check_values <- function(x,y)
{
    y_val <- unlist(strsplit(y, split = ','))
    if(x %in% y_val) return(1)
    else return (0)
}

## add column in df2 since both df have same rows
df2[, PI := df1$PromoInterval]

## extract month from Date column
df2[, month_name := months(as.Date(Date), abbreviate = T)]

## get result
df2[, result := mapply(check_values, month_name, PI)]

## first few rows of output

         Date month_name              PI result
1: 2015-07-31        Jul Jan,Apr,Jul,Oct      1
2: 2015-07-30        Jul Jan,Apr,Jul,Oct      1
3: 2015-07-29        Jul Jan,Apr,Jul,Oct      1
4: 2015-07-28        Jul Jan,Apr,Jul,Oct      1
5: 2015-07-27        Jul Jan,Apr,Jul,Oct      1

Upvotes: 0

Related Questions