phaser
phaser

Reputation: 625

R- Create a dataframe from lists of data that are not all the same length

I have a list of dataframes that I would like to turn into a single dataframe. Here is a subset of my data:

    date <- c("4/10/2016","4/11/2016","4/12/2016")
    time <- c("1:00:00 AM","1:15:00 AM", "1:30:00 AM")
    temp <- c(86.1,85.3,85.7)
    humidity <- c(39.9,39.5,39.1)
    object <- data.frame(date, time, temp, humidity)

    date2 <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
    time2 <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
    temp2 <- c(86.1,85.3,85.7,86.2)
    humidity2 <- c(39.9,39.5,39.1,40.0)
    object2 <- data.frame(date2, time2, temp2, humidity2)

    data1 <- list(object, object2)

The result I am looking for is:

   id     date        time temp humidity temp2 humidity2
    1 4/10/2016 1:00:00 AM 86.1     39.9  86.1      39.9
    2 4/11/2016 1:15:00 AM 85.3     39.5  85.3      39.5
    3 4/12/2016 1:30:00 AM 85.7     39.1  85.7      39.1
    4 4/13/2016 1:45:00 AM   NA       NA  86.2      40.0

Because the dataframes are different lengths, I have not been successful at putting them together in a single dataframe. I tried padding the dataframes with NAs but was unsuccessful. I ended up adding new columns with NAs. I am new to coding so any explanations would be helpful. Please let me know if there is anything I can do to improve my question.

Update: Here is my data with one sensor starting after the others started.

    date <- c("4/10/2016","4/11/2016","4/12/2016")
    time <- c("1:00:00 AM","1:15:00 AM", "1:30:00 AM")
    temp <- c(86.1,85.3,85.7)
    humidity <- c(39.9,39.5,39.1)
    object <- data.frame(date, time, temp, humidity)

    date <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
    time <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
    temp2 <- c(86.1,85.3,85.7,86.2)
    humidity2 <- c(39.9,39.5,39.1,40.0)
    object2 <- data.frame(date, time, temp2, humidity2)

    date <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
    time <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
    temp3 <- c(NA,84.3,84.7,85.2)
    humidity3 <- c(NA,40.1,39.7,40.5)
    object3 <- data.frame(date, time, temp3, humidity3)

    data1 <- list(object, object2, object3)

And here's what I want it to look like:

   id     date        time temp humidity temp2 humidity2 temp3 humidity3
    1 4/10/2016 1:00:00 AM 86.1     39.9  86.1      39.9    NA        NA
    2 4/11/2016 1:15:00 AM 85.3     39.5  85.3      39.5  84.3      40.1
    3 4/12/2016 1:30:00 AM 85.7     39.1  85.7      39.1  84.7      39.7
    4 4/13/2016 1:45:00 AM   NA       NA  86.2      40.0  85.2      40.5

Upvotes: 2

Views: 113

Answers (1)

SymbolixAU
SymbolixAU

Reputation: 26258

To rbind list elements together, you first need to have the names of the columns of each list elements to be the same.

names(data1[[2]]) <- names(data1[[1]])

(However, be careful here that the names of the columns are in the correct order.)

Then use do.call(rbind, data1)

do.call(rbind, data1)
#       date       time temp humidity
# 1 4/10/2016 1:00:00 AM 86.1     39.9
# 2 4/11/2016 1:15:00 AM 85.3     39.5
# 3 4/12/2016 1:30:00 AM 85.7     39.1
# 4 4/10/2016 1:00:00 AM 86.1     39.9
# 5 4/11/2016 1:15:00 AM 85.3     39.5
# 6 4/12/2016 1:30:00 AM 85.7     39.1
# 7 4/13/2016 1:45:00 AM 86.2     40.0

The pacakge data.table has an rbindlist function, that rbinds list elements. As you have different column names, you can either specify use.names=T & fill=T, so that it will pad with NAs where column names don't match. Or, leave use.names = F, in which icase it will use the order of the columns and assume they are the same

library(data.table)

rbindlist(data1)
#         date       time temp humidity
# 1: 4/10/2016 1:00:00 AM 86.1     39.9
# 2: 4/11/2016 1:15:00 AM 85.3     39.5
# 3: 4/12/2016 1:30:00 AM 85.7     39.1
# 4: 4/10/2016 1:00:00 AM 86.1     39.9
# 5: 4/11/2016 1:15:00 AM 85.3     39.5
# 6: 4/12/2016 1:30:00 AM 85.7     39.1
# 7: 4/13/2016 1:45:00 AM 86.2     40.0

rbindlist(data1, use.names = T, fill = T)
#               date       time temp humidity     date2      time2 temp2 humidity2
# 1: 4/10/2016 1:00:00 AM 86.1     39.9        NA         NA    NA        NA
# 2: 4/11/2016 1:15:00 AM 85.3     39.5        NA         NA    NA        NA
# 3: 4/12/2016 1:30:00 AM 85.7     39.1        NA         NA    NA        NA
# 4:        NA         NA   NA       NA 4/10/2016 1:00:00 AM  86.1      39.9
# 5:        NA         NA   NA       NA 4/11/2016 1:15:00 AM  85.3      39.5
# 6:        NA         NA   NA       NA 4/12/2016 1:30:00 AM  85.7      39.1
# 7:        NA         NA   NA       NA 4/13/2016 1:45:00 AM  86.2      40.0

If you're looking to do a SQL-type join, you use merge (in base R)

merge(data1[[1]], data1[[2]], by.x = c("date","time","temp","humidity"), 
                              by.y = c("date2","time2","temp2","humidity2"), all = T)

#        date       time temp humidity
# 1 4/10/2016 1:00:00 AM 86.1     39.9
# 2 4/11/2016 1:15:00 AM 85.3     39.5
# 3 4/12/2016 1:30:00 AM 85.7     39.1
# 4 4/13/2016 1:45:00 AM 86.2     40.0

Update

Based on your edit, you're after a right-join, so use merge with all.y = T

    merge(data1[[1]], data1[[2]], by.x = c("date"), by.y = c("date2"), all.y = T)
#         date       time temp humidity      time2 temp2 humidity2
# 1 4/10/2016 1:00:00 AM 86.1     39.9 1:00:00 AM  86.1      39.9
# 2 4/11/2016 1:15:00 AM 85.3     39.5 1:15:00 AM  85.3      39.5
# 3 4/12/2016 1:30:00 AM 85.7     39.1 1:30:00 AM  85.7      39.1
# 4 4/13/2016       <NA>   NA       NA 1:45:00 AM  86.2      40.0   

Upvotes: 1

Related Questions