Reputation: 197
unfortunately, I need to completely re-edit the post as my initially proposed data snippet was not entirely reflecting my data set. Essentially my data set looks like below:
date id person member_since consistent
May 2015 2732 3041 2008-09-17 Yes
Jul 2015 2732 3041 2008-09-17 Yes
Sep 2015 2732 3041 2008-09-17 Yes
Nov 2015 2732 3041 2008-09-17 Yes
Jan 2016 2732 3041 2008-09-17 Yes
May 2016 2732 3041 2008-09-17 Yes
Jul 2016 2732 3041 2008-09-17 Yes
Aug 2016 2732 3041 2008-09-17 Yes
Mar 2017 2732 3041 2008-09-17 Yes
Apr 2017 2732 3041 2008-09-17 Yes
May 2017 2732 3041 2008-09-17 Yes
I would now like to create entries on a monthly basis until the month of "member_since", keeping everything constant (see below). Additionally, there will be some variables marked as "consistent"= No (not shown here), which should remain as is.
date id person member_since consistent
Sep 2008 2732 3041 2008-09-17 Yes
Oct 2008 2732 3041 2008-09-17 Yes
Nov 2008 2732 3041 2008-09-17 Yes
.
.
.
May 2015 2732 3041 2008-09-17 Yes
Jul 2015 2732 3041 2008-09-17 Yes
Sep 2015 2732 3041 2008-09-17 Yes
Nov 2015 2732 3041 2008-09-17 Yes
Jan 2016 2732 3041 2008-09-17 Yes
May 2016 2732 3041 2008-09-17 Yes
Jul 2016 2732 3041 2008-09-17 Yes
Aug 2016 2732 3041 2008-09-17 Yes
Mar 2017 2732 3041 2008-09-17 Yes
Apr 2017 2732 3041 2008-09-17 Yes
May 2017 2732 3041 2008-09-17 Yes
Can you please help me with the code?
The proposed code below I think is on the right track but doesn't yield yet to the desired outcome.
library(dplyr)
library(zoo)
monthly_series <- (seq.Date(min(as.Date(df$member_since)),
as.Date("2017-05-01"),by="months"))
df2 <-
df%>%
group_by(Person)%>%
merge(monthly_series)%>%
dplyr::filter((consistent=="Yes"&as.yearmon(as.Date(member_since))<=as.yearmon(y))|
(consistent=="No"& as.yearmon(as.Date(member_since))==as.yearmon(y))) %>%
mutate(Month=as.yearmon(y)) %>%
select(-y) %>%
arrange(Person)
Thanks a lot in advance!
Upvotes: 1
Views: 572
Reputation: 10301
Here is a plyr
alternative to the many options you have. Your example data:
library(plyr)
df <- read.table( text = "Person Member_since Applicable
A 2013-08-04 Yes
B 2011-12-23 No
C 2014-09-03 Yes", header = TRUE)
Make sure Member_since
is in date format:
df$Member_since <- as.Date(df$Member_since)
Use adply
to iterate over the rows and build new rows with sequential months if Applicable
is "Yes", and just the original row if Applicable
is "No".
df2<-adply(df, 1, function(x) {
if(x$Applicable == "Yes"){
data.frame(x, month = format(
seq.Date(as.Date(x$Member_since), as.Date("2017-06-01"), by = "month"),
"%b %Y"))
}
else{
data.frame(x, month = format(x$Member_since, "%b %Y"))
}
}
)
> head(df2);tail(df2)
Person Member_since Applicable month
1 A 2013-08-04 Yes Aug 2013
2 A 2013-08-04 Yes Sep 2013
3 A 2013-08-04 Yes Oct 2013
4 A 2013-08-04 Yes Nov 2013
5 A 2013-08-04 Yes Dec 2013
6 A 2013-08-04 Yes Jan 2014
Person Member_since Applicable month
75 C 2014-09-03 Yes Dec 2016
76 C 2014-09-03 Yes Jan 2017
77 C 2014-09-03 Yes Feb 2017
78 C 2014-09-03 Yes Mar 2017
79 C 2014-09-03 Yes Apr 2017
80 C 2014-09-03 Yes May 2017
Upvotes: 0
Reputation: 51592
Here is an idea via Base R,
#convert variables to dates and characters,
df$Member_since <- as.Date(df$Member_since, format = '%Y-%m-%d')
df[c(1, 3)] <- lapply(df[c(1,3)], as.character)
str(df) 'data.frame': 3 obs. of 3 variables: $ Person : chr "A" "B" "C" $ Member_since: Date, format: "2013-08-04" "2011-12-23" "2014-09-03" $ Applicable : chr "Yes" "No" "Yes"
#create a list with monthly sequence of dates
l1 <- sapply(df$Member_since[df$Applicable == 'Yes'], function(i)
seq.Date(i, as.Date('2017-05-01'), by = 'month'))
#Create an extra variable based on the lengths of l1 to act as counter for expanding the data frame
df$cnt[df$Applicable == 'Yes'] <- lengths(l1)
#Change NA to 1 so it does not expand
df[is.na(df)] <- 1
#expand data frame
df1 <- df[rep(rownames(df), df$cnt),]
#create your Month variable by substituting the dates from l1
df1$Month[df1$Applicable == 'Yes'] <- format(do.call(c, l1), format = '%b %Y')
#take care of the NAs emerged from applicable being NO
df1$Month <- mapply(function(x, y)
replace(x, is.na(x), format(y, '%b %Y')), df1$Month, df1$Member_since)
#drop unwanted columns
df1$cnt <- NULL
head(df1)
# Person Member_since Applicable Month
#1 A 2013-08-04 Yes Aug 2013
#1.1 A 2013-08-04 Yes Sep 2013
#1.2 A 2013-08-04 Yes Oct 2013
#1.3 A 2013-08-04 Yes Nov 2013
#1.4 A 2013-08-04 Yes Dec 2013
#1.5 A 2013-08-04 Yes Jan 2014
tail(df1)
# Person Member_since Applicable Month
#3.26 C 2014-09-03 Yes Nov 2016
#3.27 C 2014-09-03 Yes Dec 2016
#3.28 C 2014-09-03 Yes Jan 2017
#3.29 C 2014-09-03 Yes Feb 2017
#3.30 C 2014-09-03 Yes Mar 2017
#3.31 C 2014-09-03 Yes Apr 2017
DATA
dput(df)
structure(list(Person = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Member_since = structure(c(2L, 1L, 3L), .Label = c("2011-12-23",
"2013-08-04", "2014-09-03"), class = "factor"), Applicable = structure(c(2L,
1L, 2L), .Label = c("No", "Yes"), class = "factor")), .Names = c("Person",
"Member_since", "Applicable"), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Reputation: 831
You should check this package out: https://cran.r-project.org/web/packages/padr/vignettes/padr.html
Your answer will look something like this: df2 <- df %>% filter(Applicable == "Yes") %>% pad(group = "Person", start_val = Member_since, end_val = "2017-05-04", interval = "month")
.
I would then do append_df <- df %>% filter(Applicable == "No")
so you have all the "No" answers in a separated dataset.
Finally, you can add back in with final_df <- bind_rows(df2, append_df
. This should get you on the right path.
Upvotes: 1
Reputation: 16277
Here's how to do that with dplyr
with the help of yearmon
from package zoo
.
Basically, the idea is to create a monthly date vector with seq.Date
and merge
it with your original data.frame. Then, you can filter
out the rows that are not wanted. I'm using dplyr::filter
because filter
is masked by another package on my machine. On yours, using filter
by itself might be sufficient.
library(dplyr);library(zoo)
df <- read.table(text="Person Member_since Applicable
A 2013-08-04 Yes
B 2011-12-23 No
C 2014-09-03 Yes",header=TRUE,stringsAsFactors=FALSE)
monthly_series <- (seq.Date(min(as.Date(df$Member_since)),
as.Date("2017-05-01"),by="months"))
df%>%
group_by(Person)%>%
merge(monthly_series)%>%
dplyr::filter((Applicable=="Yes"&as.yearmon(as.Date(Member_since))<=as.yearmon(y))|
(Applicable=="No"& as.yearmon(as.Date(Member_since))==as.yearmon(y))) %>%
mutate(Month=as.yearmon(y)) %>%
select(-y) %>%
arrange(Person)
Person Member_since Applicable Month
1 A 2013-08-04 Yes Aug 2013
2 A 2013-08-04 Yes Sep 2013
3 A 2013-08-04 Yes Oct 2013
4 A 2013-08-04 Yes Nov 2013
5 A 2013-08-04 Yes Dec 2013
6 A 2013-08-04 Yes Jan 2014
7 A 2013-08-04 Yes Feb 2014
8 A 2013-08-04 Yes Mar 2014
9 A 2013-08-04 Yes Apr 2014
10 A 2013-08-04 Yes May 2014
11 A 2013-08-04 Yes Jun 2014
12 A 2013-08-04 Yes Jul 2014
13 A 2013-08-04 Yes Aug 2014
14 A 2013-08-04 Yes Sep 2014
15 A 2013-08-04 Yes Oct 2014
16 A 2013-08-04 Yes Nov 2014
17 A 2013-08-04 Yes Dec 2014
18 A 2013-08-04 Yes Jan 2015
19 A 2013-08-04 Yes Feb 2015
20 A 2013-08-04 Yes Mar 2015
21 A 2013-08-04 Yes Apr 2015
22 A 2013-08-04 Yes May 2015
23 A 2013-08-04 Yes Jun 2015
24 A 2013-08-04 Yes Jul 2015
25 A 2013-08-04 Yes Aug 2015
26 A 2013-08-04 Yes Sep 2015
27 A 2013-08-04 Yes Oct 2015
28 A 2013-08-04 Yes Nov 2015
29 A 2013-08-04 Yes Dec 2015
30 A 2013-08-04 Yes Jan 2016
31 A 2013-08-04 Yes Feb 2016
32 A 2013-08-04 Yes Mar 2016
33 A 2013-08-04 Yes Apr 2016
34 A 2013-08-04 Yes May 2016
35 A 2013-08-04 Yes Jun 2016
36 A 2013-08-04 Yes Jul 2016
37 A 2013-08-04 Yes Aug 2016
38 A 2013-08-04 Yes Sep 2016
39 A 2013-08-04 Yes Oct 2016
40 A 2013-08-04 Yes Nov 2016
41 A 2013-08-04 Yes Dec 2016
42 A 2013-08-04 Yes Jan 2017
43 A 2013-08-04 Yes Feb 2017
44 A 2013-08-04 Yes Mar 2017
45 A 2013-08-04 Yes Apr 2017
46 B 2011-12-23 No Dec 2011
47 C 2014-09-03 Yes Sep 2014
48 C 2014-09-03 Yes Oct 2014
49 C 2014-09-03 Yes Nov 2014
50 C 2014-09-03 Yes Dec 2014
51 C 2014-09-03 Yes Jan 2015
52 C 2014-09-03 Yes Feb 2015
53 C 2014-09-03 Yes Mar 2015
54 C 2014-09-03 Yes Apr 2015
55 C 2014-09-03 Yes May 2015
56 C 2014-09-03 Yes Jun 2015
57 C 2014-09-03 Yes Jul 2015
58 C 2014-09-03 Yes Aug 2015
59 C 2014-09-03 Yes Sep 2015
60 C 2014-09-03 Yes Oct 2015
61 C 2014-09-03 Yes Nov 2015
62 C 2014-09-03 Yes Dec 2015
63 C 2014-09-03 Yes Jan 2016
64 C 2014-09-03 Yes Feb 2016
65 C 2014-09-03 Yes Mar 2016
66 C 2014-09-03 Yes Apr 2016
67 C 2014-09-03 Yes May 2016
68 C 2014-09-03 Yes Jun 2016
69 C 2014-09-03 Yes Jul 2016
70 C 2014-09-03 Yes Aug 2016
71 C 2014-09-03 Yes Sep 2016
72 C 2014-09-03 Yes Oct 2016
73 C 2014-09-03 Yes Nov 2016
74 C 2014-09-03 Yes Dec 2016
75 C 2014-09-03 Yes Jan 2017
76 C 2014-09-03 Yes Feb 2017
77 C 2014-09-03 Yes Mar 2017
78 C 2014-09-03 Yes Apr 2017
Upvotes: 1