Reputation: 115
I have this column of months in my data table that show the consecutive intervals of months.
> data[,"PromoInterval"]
PromoInterval
1: Jan,Apr,Jul,Oct
2: Jan,Apr,Jul,Oct
3: Jan,Apr,Jul,Oct
4: Jan,Apr,Jul,Oct
5: Jan,Apr,Jul,Oct
---
324322: Jan,Apr,Jul,Oct
324323: Jan,Apr,Jul,Oct
324324: Jan,Apr,Jul,Oct
324325: Jan,Apr,Jul,Oct
324326: Jan,Apr,Jul,Oct
Then I want to check if the date given below is in the corresponding interval assuming that each row of all the base was given in the same year.
The date column is:
> data[,"Date"]
Date
1: 2015-07-31
2: 2015-07-30
3: 2015-07-29
4: 2015-07-28
5: 2015-07-27
---
324322: 2013-01-05
324323: 2013-01-04
324324: 2013-01-03
324325: 2013-01-02
324326: 2013-01-01
For example I need to know if the date 2015-07-31 of the first row is in the interval Jan,Apr,Jul,Oct given by the first row of the PromoInterval variable.
So I create a new variable result that convert the month given by the Date variable to his real noun:
data[,resultat:=as.character(month(ymd(010101) + months((data[,DateMonth])-1),label=TRUE,abbr=TRUE))]
> data[,"resultat"]
resultat
1: juil\\.
2: juil\\.
3: juil\\.
4: juil\\.
5: juil\\.
---
324322: janv\\.
324323: janv\\.
324324: janv\\.
324325: janv\\.
324326: janv\\.
But I don't know why the format was like this above ?
Then I create a list that make PromoInterval as list type for each row:
data[,list:=strsplit((data[,PromoInterval]),split=',',fixed=TRUE)]
Then I compare if the month's name given by the resultat column does exist in the variable list. For example if the juil
of the first row of resultat
is in Jan,Apr,Jul,Oct
given by the first row of PromoInterval
.
So I create this binairy variable :
data[,Promoinsales:=if(resultat %in% list) {1} else {0}]
But the result was all null and not correct because the first row should be 1 not 0!(7->Jul
exist in the list Jan,Apr,Jul,Oct
)
> data[,"Promoinsales"]
Promoinsales
1: 0
2: 0
3: 0
4: 0
5: 0
---
324322: 0
324323: 0
324324: 0
324325: 0
324326: 0
How do I resolve this please? thank you in advance!
Upvotes: 0
Views: 55
Reputation: 79208
dat[,promoinSales:=as.numeric(grepl(month.abb[month(Date)],PromoInterval)),by=1:nrow(dat)][]
nrow Date PromoInterval promoinSales
1: 1 2015-06-27 Jan,Apr,Jul,Oct 0
2: 2 2015-05-27 Jan,Apr,Jul,Oct 0
3: 3 2015-04-27 Jan,Apr,Jul,Oct 1
4: 4 2015-01-27 Jan,Apr,Jul,Oct 1
5: 5 2015-10-27 Jan,Apr,Jul,Oct 1
6: 6 2015-12-27 Jan,Apr,Jul,Oct 0
dat[,promoinSales:=as.numeric(grepl(format(as.Date(Date),"%b"),PromoInterval)),by=1:nrow(dat)][]
Data:
dat=fread(" Date PromoInterval
2015-06-27 Jan,Apr,Jul,Oct
2015-05-27 Jan,Apr,Jul,Oct
2015-04-27 Jan,Apr,Jul,Oct
2015-01-27 Jan,Apr,Jul,Oct
2015-10-27 Jan,Apr,Jul,Oct
2015-12-27 Jan,Apr,Jul,Oct
",sep=" ")
Upvotes: 1
Reputation: 21709
This should give you some headstart to solve this.
## this function checks if month name is present in PI
check_values <- function(x,y)
{
y_val <- unlist(strsplit(y, split = ','))
if(x %in% y_val) return(1)
else return (0)
}
## add column in df2 since both df have same rows
df2[, PI := df1$PromoInterval]
## extract month from Date column
df2[, month_name := months(as.Date(Date), abbreviate = T)]
## get result
df2[, result := mapply(check_values, month_name, PI)]
## first few rows of output
Date month_name PI result
1: 2015-07-31 Jul Jan,Apr,Jul,Oct 1
2: 2015-07-30 Jul Jan,Apr,Jul,Oct 1
3: 2015-07-29 Jul Jan,Apr,Jul,Oct 1
4: 2015-07-28 Jul Jan,Apr,Jul,Oct 1
5: 2015-07-27 Jul Jan,Apr,Jul,Oct 1
Upvotes: 0