FooBar
FooBar

Reputation: 16478

Expanding a dataframe using data.table

Consider a data frame of the form

       idnum      start        end var1 var2 var3 var4 
1993.1    17 1993-01-01 1993-12-31  foo  bar    1    2 

with start and end being of type Date

 $ idnum : int  17 17 17 17 27 27
 $ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...
 $ end   : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...

I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start and end (including the boundaries):

Desired Output

idnum       month var1 var2 var3 var4 
   17  1993-01-01  foo  bar    1    2
   17  1993-02-01  foo  bar    1    2
...
   17  1993-12-01  foo  bar    1    2

I was suggested

require(data.table) ## 1.9.2+
setDT(df)[, list(idnum=idnum, month=seq(start,end,by="month")), by=1:nrow(df)]

However, I have a long list of additional columns that I also want to move with me (basically all the columns inside df besides start, end. Is there an elegant way of providing these additional columns? My naive approach was to replace idnum=idnum with colnames(df), which did not work.

Update

I tried, as suggested (since I wanted the code to be robust to changes of order in columns, I adjusted it slightly)

columnNames = colnames(df)[colnames(df) != 'start' & colnames(df) != 'end']
require(data.table) 
test <- data.frame(df)
setDT(test)
result <- test[, list( month=seq(start,end,by="month")), by=eval(columnNames) ]

but I got an

Error in seq.Date(start, end, by = "month") : 'from' must be of length 1

Upvotes: 1

Views: 176

Answers (1)

jlhoward
jlhoward

Reputation: 59335

Put the columns in by=list(...)

library(data.table)
df <- data.frame(idnum=17, 
                 start=as.Date("1993-01-01"), end=as.Date("1993-12-31"),
                 var1="foo",var2="bar",var3=1,var4=2)
setDT(df)
result <-df[,list(month=seq(start,end,by="month")), by=list(idnum,var1,var2,var3,var4)]
head(result)
#    idnum var1 var2 var3 var4      month
# 1:    17  foo  bar    1    2 1993-01-01
# 2:    17  foo  bar    1    2 1993-02-01
# 3:    17  foo  bar    1    2 1993-03-01
# 4:    17  foo  bar    1    2 1993-04-01
# 5:    17  foo  bar    1    2 1993-05-01
# 6:    17  foo  bar    1    2 1993-06-01

If you have a great many columns and don't want to enumerate them, this will include all columns except 2:3: (produces the same result above with this dataset).

result <-df[,list(month=seq(start,end,by="month")), by=eval(names(df)[-(2:3)])]

Upvotes: 1

Related Questions