Inserting rows into a dataframe based on a vector that contains dates

Question

This is what my dataframe looks like:

df <- read.table(text='

    Name      ActivityType     ActivityDate              
     John       Email            2014-01-01                              
     John       Webinar          2014-01-05                            
     John       Webinar          2014-01-20                                                       
     John       Email            2014-04-20                            
     Tom        Email            2014-01-01                              
     Tom       Webinar           2014-01-05                           
     Tom       Webinar           2014-01-20                                                        
     Tom       Email             2014-04-20                              

    ', header=T, row.names = NULL)

I have this vector x which contains different dates x<- c("2014-01-03","2014-01-25","2015-05-27"). I want to insert rows in my original dataframe in a way that incorporates these dates in the x vector.This is what the output should look like:

    Name      ActivityType     ActivityDate              
     John       Email            2014-01-01
     John        NA              2014-01-03        
     John       Webinar          2014-01-05                            
     John       Webinar          2014-01-20
     John       NA               2014-01-25                                                       
     John       Email            2014-04-20
     John       NA               2015-05-27                            
     Tom        Email            2014-01-01
     Tom        NA               2014-01-03                              
     Tom       Webinar           2014-01-05                           
     Tom       Webinar           2014-01-20
     Tom       NA                2014-01-25                                                        
     Tom       Email             2014-04-20
     Tom       NA                2015-05-27

Sincerely appreciate your help!

G. Grothendieck · Accepted Answer

1) expand.grid Using expand.grid create a data frame adds with the rows to be added and then use rbind to combine df and adds converting the ActivityDate column to "Date" class. Then sort. No packages are used.

adds <- expand.grid(Name = levels(df$Name), ActivityType = NA, ActivityDate = x)
both <- transform(rbind(df, adds), ActivityDate = as.Date(ActivityDate))

o <- with(both, order(Name, ActivityDate))
both[o, ]

giving:

   Name ActivityType ActivityDate
1  John        Email   2014-01-01
9  John            2014-01-03
2  John      Webinar   2014-01-05
3  John      Webinar   2014-01-20
11 John            2014-01-25
4  John        Email   2014-04-20
13 John            2015-05-27
5   Tom        Email   2014-01-01
10  Tom            2014-01-03
6   Tom      Webinar   2014-01-05
7   Tom      Webinar   2014-01-20
12  Tom            2014-01-25
8   Tom        Email   2014-04-20
14  Tom            2015-05-27

2) sqldf This uploads adds and df to an sqlite data base which it creates on the fly, then it performs the sql query and downloads the result. The computation occurs outside of R so it might work with your large data.

adds <- data.frame(Name = NA, ActivityDate = x)

library(sqldf)

sqldf("select * 
       from (select * 
             from df 
             union 
             select a.Name, NULL ActivityType, ActivityDate 
             from (select distinct Name from df) a 
             cross join adds b
            ) order by 1, 3"
      )

giving:

   Name ActivityType ActivityDate
1  John        Email   2014-01-01
2  John            2014-01-03
3  John      Webinar   2014-01-05
4  John      Webinar   2014-01-20
5  John            2014-01-25
6  John        Email   2014-04-20
7  John            2015-05-27
8   Tom        Email   2014-01-01
9   Tom            2014-01-03
10  Tom      Webinar   2014-01-05
11  Tom      Webinar   2014-01-20
12  Tom            2014-01-25
13  Tom        Email   2014-04-20
14  Tom            2015-05-27

Inserting rows into a dataframe based on a vector that contains dates

Answers (2)

Related Questions