Reputation: 625
I have a list of dataframes that I would like to turn into a single dataframe. Here is a subset of my data:
date <- c("4/10/2016","4/11/2016","4/12/2016")
time <- c("1:00:00 AM","1:15:00 AM", "1:30:00 AM")
temp <- c(86.1,85.3,85.7)
humidity <- c(39.9,39.5,39.1)
object <- data.frame(date, time, temp, humidity)
date2 <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
time2 <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
temp2 <- c(86.1,85.3,85.7,86.2)
humidity2 <- c(39.9,39.5,39.1,40.0)
object2 <- data.frame(date2, time2, temp2, humidity2)
data1 <- list(object, object2)
The result I am looking for is:
id date time temp humidity temp2 humidity2
1 4/10/2016 1:00:00 AM 86.1 39.9 86.1 39.9
2 4/11/2016 1:15:00 AM 85.3 39.5 85.3 39.5
3 4/12/2016 1:30:00 AM 85.7 39.1 85.7 39.1
4 4/13/2016 1:45:00 AM NA NA 86.2 40.0
Because the dataframes are different lengths, I have not been successful at putting them together in a single dataframe. I tried padding the dataframes with NAs but was unsuccessful. I ended up adding new columns with NAs. I am new to coding so any explanations would be helpful. Please let me know if there is anything I can do to improve my question.
Update: Here is my data with one sensor starting after the others started.
date <- c("4/10/2016","4/11/2016","4/12/2016")
time <- c("1:00:00 AM","1:15:00 AM", "1:30:00 AM")
temp <- c(86.1,85.3,85.7)
humidity <- c(39.9,39.5,39.1)
object <- data.frame(date, time, temp, humidity)
date <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
time <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
temp2 <- c(86.1,85.3,85.7,86.2)
humidity2 <- c(39.9,39.5,39.1,40.0)
object2 <- data.frame(date, time, temp2, humidity2)
date <- c("4/10/2016","4/11/2016","4/12/2016","4/13/2016")
time <- c("1:00:00 AM","1:15:00 AM","1:30:00 AM","1:45:00 AM")
temp3 <- c(NA,84.3,84.7,85.2)
humidity3 <- c(NA,40.1,39.7,40.5)
object3 <- data.frame(date, time, temp3, humidity3)
data1 <- list(object, object2, object3)
And here's what I want it to look like:
id date time temp humidity temp2 humidity2 temp3 humidity3
1 4/10/2016 1:00:00 AM 86.1 39.9 86.1 39.9 NA NA
2 4/11/2016 1:15:00 AM 85.3 39.5 85.3 39.5 84.3 40.1
3 4/12/2016 1:30:00 AM 85.7 39.1 85.7 39.1 84.7 39.7
4 4/13/2016 1:45:00 AM NA NA 86.2 40.0 85.2 40.5
Upvotes: 2
Views: 113
Reputation: 26258
To rbind
list elements together, you first need to have the names of the columns of each list elements to be the same.
names(data1[[2]]) <- names(data1[[1]])
(However, be careful here that the names of the columns are in the correct order.)
Then use do.call(rbind, data1)
do.call(rbind, data1)
# date time temp humidity
# 1 4/10/2016 1:00:00 AM 86.1 39.9
# 2 4/11/2016 1:15:00 AM 85.3 39.5
# 3 4/12/2016 1:30:00 AM 85.7 39.1
# 4 4/10/2016 1:00:00 AM 86.1 39.9
# 5 4/11/2016 1:15:00 AM 85.3 39.5
# 6 4/12/2016 1:30:00 AM 85.7 39.1
# 7 4/13/2016 1:45:00 AM 86.2 40.0
The pacakge data.table
has an rbindlist
function, that rbinds list elements. As you have different column names, you can either specify use.names=T
& fill=T
, so that it will pad with NAs where column names don't match. Or, leave use.names = F
, in which icase it will use the order of the columns and assume they are the same
library(data.table)
rbindlist(data1)
# date time temp humidity
# 1: 4/10/2016 1:00:00 AM 86.1 39.9
# 2: 4/11/2016 1:15:00 AM 85.3 39.5
# 3: 4/12/2016 1:30:00 AM 85.7 39.1
# 4: 4/10/2016 1:00:00 AM 86.1 39.9
# 5: 4/11/2016 1:15:00 AM 85.3 39.5
# 6: 4/12/2016 1:30:00 AM 85.7 39.1
# 7: 4/13/2016 1:45:00 AM 86.2 40.0
rbindlist(data1, use.names = T, fill = T)
# date time temp humidity date2 time2 temp2 humidity2
# 1: 4/10/2016 1:00:00 AM 86.1 39.9 NA NA NA NA
# 2: 4/11/2016 1:15:00 AM 85.3 39.5 NA NA NA NA
# 3: 4/12/2016 1:30:00 AM 85.7 39.1 NA NA NA NA
# 4: NA NA NA NA 4/10/2016 1:00:00 AM 86.1 39.9
# 5: NA NA NA NA 4/11/2016 1:15:00 AM 85.3 39.5
# 6: NA NA NA NA 4/12/2016 1:30:00 AM 85.7 39.1
# 7: NA NA NA NA 4/13/2016 1:45:00 AM 86.2 40.0
If you're looking to do a SQL
-type join, you use merge
(in base R)
merge(data1[[1]], data1[[2]], by.x = c("date","time","temp","humidity"),
by.y = c("date2","time2","temp2","humidity2"), all = T)
# date time temp humidity
# 1 4/10/2016 1:00:00 AM 86.1 39.9
# 2 4/11/2016 1:15:00 AM 85.3 39.5
# 3 4/12/2016 1:30:00 AM 85.7 39.1
# 4 4/13/2016 1:45:00 AM 86.2 40.0
Update
Based on your edit, you're after a right-join, so use merge
with all.y = T
merge(data1[[1]], data1[[2]], by.x = c("date"), by.y = c("date2"), all.y = T)
# date time temp humidity time2 temp2 humidity2
# 1 4/10/2016 1:00:00 AM 86.1 39.9 1:00:00 AM 86.1 39.9
# 2 4/11/2016 1:15:00 AM 85.3 39.5 1:15:00 AM 85.3 39.5
# 3 4/12/2016 1:30:00 AM 85.7 39.1 1:30:00 AM 85.7 39.1
# 4 4/13/2016 <NA> NA NA 1:45:00 AM 86.2 40.0
Upvotes: 1