Reputation: 16478
Consider a data frame of the form
idnum start end var1 var2 var3 var4
1993.1 17 1993-01-01 1993-12-31 foo bar 1 2
with start
and end
being of type Date
$ idnum : int 17 17 17 17 27 27
$ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...
$ end : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...
I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start
and end
(including the boundaries):
Desired Output
idnum month var1 var2 var3 var4
17 1993-01-01 foo bar 1 2
17 1993-02-01 foo bar 1 2
...
17 1993-12-01 foo bar 1 2
I was suggested
require(data.table) ## 1.9.2+
setDT(df)[, list(idnum=idnum, month=seq(start,end,by="month")), by=1:nrow(df)]
However, I have a long list of additional columns that I also want to move with me (basically all the columns inside df
besides start
, end
. Is there an elegant way of providing these additional columns? My naive approach was to replace idnum=idnum with colnames(df), which did not work.
Update
I tried, as suggested (since I wanted the code to be robust to changes of order in columns, I adjusted it slightly)
columnNames = colnames(df)[colnames(df) != 'start' & colnames(df) != 'end']
require(data.table)
test <- data.frame(df)
setDT(test)
result <- test[, list( month=seq(start,end,by="month")), by=eval(columnNames) ]
but I got an
Error in seq.Date(start, end, by = "month") : 'from' must be of length 1
Upvotes: 1
Views: 176
Reputation: 59335
Put the columns in by=list(...)
library(data.table)
df <- data.frame(idnum=17,
start=as.Date("1993-01-01"), end=as.Date("1993-12-31"),
var1="foo",var2="bar",var3=1,var4=2)
setDT(df)
result <-df[,list(month=seq(start,end,by="month")), by=list(idnum,var1,var2,var3,var4)]
head(result)
# idnum var1 var2 var3 var4 month
# 1: 17 foo bar 1 2 1993-01-01
# 2: 17 foo bar 1 2 1993-02-01
# 3: 17 foo bar 1 2 1993-03-01
# 4: 17 foo bar 1 2 1993-04-01
# 5: 17 foo bar 1 2 1993-05-01
# 6: 17 foo bar 1 2 1993-06-01
If you have a great many columns and don't want to enumerate them, this will include all columns except 2:3: (produces the same result above with this dataset).
result <-df[,list(month=seq(start,end,by="month")), by=eval(names(df)[-(2:3)])]
Upvotes: 1