Neicooo
Neicooo

Reputation: 197

Time lag variable

unfortunately, I need to completely re-edit the post as my initially proposed data snippet was not entirely reflecting my data set. Essentially my data set looks like below:

date       id     person   member_since   consistent
May 2015   2732   3041     2008-09-17     Yes
Jul 2015   2732   3041     2008-09-17     Yes
Sep 2015   2732   3041     2008-09-17     Yes
Nov 2015   2732   3041     2008-09-17     Yes
Jan 2016   2732   3041     2008-09-17     Yes
May 2016   2732   3041     2008-09-17     Yes
Jul 2016   2732   3041     2008-09-17     Yes
Aug 2016   2732   3041     2008-09-17     Yes
Mar 2017   2732   3041     2008-09-17     Yes
Apr 2017   2732   3041     2008-09-17     Yes
May 2017   2732   3041     2008-09-17     Yes

I would now like to create entries on a monthly basis until the month of "member_since", keeping everything constant (see below). Additionally, there will be some variables marked as "consistent"= No (not shown here), which should remain as is.

date       id     person   member_since   consistent
Sep 2008   2732   3041     2008-09-17     Yes
Oct 2008   2732   3041     2008-09-17     Yes
Nov 2008   2732   3041     2008-09-17     Yes
.
.
.
May 2015   2732   3041     2008-09-17     Yes
Jul 2015   2732   3041     2008-09-17     Yes
Sep 2015   2732   3041     2008-09-17     Yes
Nov 2015   2732   3041     2008-09-17     Yes
Jan 2016   2732   3041     2008-09-17     Yes
May 2016   2732   3041     2008-09-17     Yes
Jul 2016   2732   3041     2008-09-17     Yes
Aug 2016   2732   3041     2008-09-17     Yes
Mar 2017   2732   3041     2008-09-17     Yes
Apr 2017   2732   3041     2008-09-17     Yes
May 2017   2732   3041     2008-09-17     Yes

Can you please help me with the code?

The proposed code below I think is on the right track but doesn't yield yet to the desired outcome.

library(dplyr)
library(zoo)

monthly_series <- (seq.Date(min(as.Date(df$member_since)),
                            as.Date("2017-05-01"),by="months"))


df2 <-
  df%>%
  group_by(Person)%>%
  merge(monthly_series)%>%
  dplyr::filter((consistent=="Yes"&as.yearmon(as.Date(member_since))<=as.yearmon(y))|
                  (consistent=="No"& as.yearmon(as.Date(member_since))==as.yearmon(y))) %>%
  mutate(Month=as.yearmon(y)) %>%
  select(-y) %>%
  arrange(Person)

Thanks a lot in advance!

Upvotes: 1

Views: 572

Answers (4)

Luke C
Luke C

Reputation: 10301

Here is a plyr alternative to the many options you have. Your example data:

library(plyr)

df <- read.table( text = "Person Member_since  Applicable
A      2013-08-04    Yes
B      2011-12-23    No
C      2014-09-03    Yes", header = TRUE)

Make sure Member_since is in date format:

df$Member_since <- as.Date(df$Member_since)

Use adply to iterate over the rows and build new rows with sequential months if Applicable is "Yes", and just the original row if Applicable is "No".

df2<-adply(df, 1, function(x) {
  if(x$Applicable == "Yes"){
    data.frame(x, month = format(
      seq.Date(as.Date(x$Member_since), as.Date("2017-06-01"), by = "month"),  
      "%b %Y"))
  }
  else{
    data.frame(x, month = format(x$Member_since, "%b %Y"))
  }
}
)

> head(df2);tail(df2)
  Person Member_since Applicable    month
1      A   2013-08-04        Yes Aug 2013
2      A   2013-08-04        Yes Sep 2013
3      A   2013-08-04        Yes Oct 2013
4      A   2013-08-04        Yes Nov 2013
5      A   2013-08-04        Yes Dec 2013
6      A   2013-08-04        Yes Jan 2014
   Person Member_since Applicable    month
75      C   2014-09-03        Yes Dec 2016
76      C   2014-09-03        Yes Jan 2017
77      C   2014-09-03        Yes Feb 2017
78      C   2014-09-03        Yes Mar 2017
79      C   2014-09-03        Yes Apr 2017
80      C   2014-09-03        Yes May 2017

Upvotes: 0

Sotos
Sotos

Reputation: 51592

Here is an idea via Base R,

#convert variables to dates and characters,
df$Member_since <- as.Date(df$Member_since, format = '%Y-%m-%d')
df[c(1, 3)] <- lapply(df[c(1,3)], as.character)
str(df)
'data.frame': 3 obs. of  3 variables:
$ Person      : chr  "A" "B" "C"
$ Member_since: Date, format: "2013-08-04" "2011-12-23" "2014-09-03"
$ Applicable  : chr  "Yes" "No" "Yes"

#create a list with monthly sequence of dates
l1 <- sapply(df$Member_since[df$Applicable == 'Yes'], function(i) 
                                        seq.Date(i, as.Date('2017-05-01'), by = 'month'))
#Create an extra variable based on the lengths of l1 to act as counter for expanding the data frame
df$cnt[df$Applicable == 'Yes'] <- lengths(l1)
#Change NA to 1 so it does not expand
df[is.na(df)] <- 1
#expand data frame
df1 <- df[rep(rownames(df), df$cnt),]
#create your Month variable by substituting the dates from l1
df1$Month[df1$Applicable == 'Yes'] <- format(do.call(c, l1), format = '%b %Y')
#take care of the NAs emerged from applicable being NO
df1$Month <- mapply(function(x, y) 
                replace(x, is.na(x), format(y, '%b %Y')), df1$Month, df1$Member_since)
#drop unwanted columns
df1$cnt <- NULL

head(df1)
#    Person Member_since Applicable    Month
#1        A   2013-08-04        Yes Aug 2013
#1.1      A   2013-08-04        Yes Sep 2013
#1.2      A   2013-08-04        Yes Oct 2013
#1.3      A   2013-08-04        Yes Nov 2013
#1.4      A   2013-08-04        Yes Dec 2013
#1.5      A   2013-08-04        Yes Jan 2014

tail(df1)
#     Person Member_since Applicable    Month
#3.26      C   2014-09-03        Yes Nov 2016
#3.27      C   2014-09-03        Yes Dec 2016
#3.28      C   2014-09-03        Yes Jan 2017
#3.29      C   2014-09-03        Yes Feb 2017
#3.30      C   2014-09-03        Yes Mar 2017
#3.31      C   2014-09-03        Yes Apr 2017

DATA

dput(df)
structure(list(Person = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Member_since = structure(c(2L, 1L, 3L), .Label = c("2011-12-23", 
"2013-08-04", "2014-09-03"), class = "factor"), Applicable = structure(c(2L, 
1L, 2L), .Label = c("No", "Yes"), class = "factor")), .Names = c("Person", 
"Member_since", "Applicable"), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

roarkz
roarkz

Reputation: 831

You should check this package out: https://cran.r-project.org/web/packages/padr/vignettes/padr.html

Your answer will look something like this: df2 <- df %>% filter(Applicable == "Yes") %>% pad(group = "Person", start_val = Member_since, end_val = "2017-05-04", interval = "month"). I would then do append_df <- df %>% filter(Applicable == "No") so you have all the "No" answers in a separated dataset. Finally, you can add back in with final_df <- bind_rows(df2, append_df. This should get you on the right path.

Upvotes: 1

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

Here's how to do that with dplyr with the help of yearmon from package zoo.

Basically, the idea is to create a monthly date vector with seq.Date and merge it with your original data.frame. Then, you can filter out the rows that are not wanted. I'm using dplyr::filter because filter is masked by another package on my machine. On yours, using filter by itself might be sufficient.

library(dplyr);library(zoo)
df <- read.table(text="Person Member_since  Applicable
A      2013-08-04    Yes
B      2011-12-23    No
C      2014-09-03    Yes",header=TRUE,stringsAsFactors=FALSE)

monthly_series <- (seq.Date(min(as.Date(df$Member_since)),
                     as.Date("2017-05-01"),by="months"))

df%>%
  group_by(Person)%>%
  merge(monthly_series)%>%
  dplyr::filter((Applicable=="Yes"&as.yearmon(as.Date(Member_since))<=as.yearmon(y))|
                (Applicable=="No"& as.yearmon(as.Date(Member_since))==as.yearmon(y))) %>%
  mutate(Month=as.yearmon(y)) %>%
  select(-y) %>%
  arrange(Person)



   Person Member_since Applicable    Month
1       A   2013-08-04        Yes Aug 2013
2       A   2013-08-04        Yes Sep 2013
3       A   2013-08-04        Yes Oct 2013
4       A   2013-08-04        Yes Nov 2013
5       A   2013-08-04        Yes Dec 2013
6       A   2013-08-04        Yes Jan 2014
7       A   2013-08-04        Yes Feb 2014
8       A   2013-08-04        Yes Mar 2014
9       A   2013-08-04        Yes Apr 2014
10      A   2013-08-04        Yes May 2014
11      A   2013-08-04        Yes Jun 2014
12      A   2013-08-04        Yes Jul 2014
13      A   2013-08-04        Yes Aug 2014
14      A   2013-08-04        Yes Sep 2014
15      A   2013-08-04        Yes Oct 2014
16      A   2013-08-04        Yes Nov 2014
17      A   2013-08-04        Yes Dec 2014
18      A   2013-08-04        Yes Jan 2015
19      A   2013-08-04        Yes Feb 2015
20      A   2013-08-04        Yes Mar 2015
21      A   2013-08-04        Yes Apr 2015
22      A   2013-08-04        Yes May 2015
23      A   2013-08-04        Yes Jun 2015
24      A   2013-08-04        Yes Jul 2015
25      A   2013-08-04        Yes Aug 2015
26      A   2013-08-04        Yes Sep 2015
27      A   2013-08-04        Yes Oct 2015
28      A   2013-08-04        Yes Nov 2015
29      A   2013-08-04        Yes Dec 2015
30      A   2013-08-04        Yes Jan 2016
31      A   2013-08-04        Yes Feb 2016
32      A   2013-08-04        Yes Mar 2016
33      A   2013-08-04        Yes Apr 2016
34      A   2013-08-04        Yes May 2016
35      A   2013-08-04        Yes Jun 2016
36      A   2013-08-04        Yes Jul 2016
37      A   2013-08-04        Yes Aug 2016
38      A   2013-08-04        Yes Sep 2016
39      A   2013-08-04        Yes Oct 2016
40      A   2013-08-04        Yes Nov 2016
41      A   2013-08-04        Yes Dec 2016
42      A   2013-08-04        Yes Jan 2017
43      A   2013-08-04        Yes Feb 2017
44      A   2013-08-04        Yes Mar 2017
45      A   2013-08-04        Yes Apr 2017
46      B   2011-12-23         No Dec 2011
47      C   2014-09-03        Yes Sep 2014
48      C   2014-09-03        Yes Oct 2014
49      C   2014-09-03        Yes Nov 2014
50      C   2014-09-03        Yes Dec 2014
51      C   2014-09-03        Yes Jan 2015
52      C   2014-09-03        Yes Feb 2015
53      C   2014-09-03        Yes Mar 2015
54      C   2014-09-03        Yes Apr 2015
55      C   2014-09-03        Yes May 2015
56      C   2014-09-03        Yes Jun 2015
57      C   2014-09-03        Yes Jul 2015
58      C   2014-09-03        Yes Aug 2015
59      C   2014-09-03        Yes Sep 2015
60      C   2014-09-03        Yes Oct 2015
61      C   2014-09-03        Yes Nov 2015
62      C   2014-09-03        Yes Dec 2015
63      C   2014-09-03        Yes Jan 2016
64      C   2014-09-03        Yes Feb 2016
65      C   2014-09-03        Yes Mar 2016
66      C   2014-09-03        Yes Apr 2016
67      C   2014-09-03        Yes May 2016
68      C   2014-09-03        Yes Jun 2016
69      C   2014-09-03        Yes Jul 2016
70      C   2014-09-03        Yes Aug 2016
71      C   2014-09-03        Yes Sep 2016
72      C   2014-09-03        Yes Oct 2016
73      C   2014-09-03        Yes Nov 2016
74      C   2014-09-03        Yes Dec 2016
75      C   2014-09-03        Yes Jan 2017
76      C   2014-09-03        Yes Feb 2017
77      C   2014-09-03        Yes Mar 2017
78      C   2014-09-03        Yes Apr 2017

Upvotes: 1

Related Questions