Reputation: 1746
I have 2 datasets:
Data1:
Var1 Var2 Var3 Var4
10 10 2 3
9 2 8 3
6 4 4 8
7 3 10 8
Data2:
Var1 Var5 Var3 Var6
3 6 6 4
1 2 5 1
9 2 2 9
2 6 3 2
Now I want to append this 2 datasets
Final Data:
Var1 Var2 Var3 Var4 Var5 Var6
10 10 2 3
9 2 8 3
6 4 4 8
7 3 10 8
3 4 6 6
1 1 2 5
9 9 2 2
2 2 6 3
I can't use rbind to create this dataset. Can anybody please tell me the method to create this dataset? Also, suppose I want to append multiple (more than 2) datasets. What's the procedure?
Upvotes: 1
Views: 5424
Reputation: 12723
# Open a new directory and keep only the data files to be combined
combinedfiles <- function(){
# nullVar: Creating a Null Variable using as.null function
nullVar <- function(x){
x <- getwd();
x <- as.null(x);
}
# readTab: Read file using read.table function
readTab <- function(y) {
read.table(y, header=TRUE, sep = " ")
}
objectcontent <- nullVar(x);
for (i in 1:length(list.files(getwd()))) {
y <- list.files(getwd())[i];
objectcontent <- rbind(objectcontent, readTab(y));
i <- i + 1
}
return(objectcontent)
}
#Then type the following in the console
combinedfiles()
a version using apply loops (which do not suffer from the rbind slowdown):
combined_files = function(file_path, extension = "csv") {
require(plyr)
file_list = list.files(file_path, pattern = extension)
data_list = lapply(file_list, read.table, header = TRUE, sep = " ")
combined_data = do.call("rbind.fill", data_list)
return(combined_data)
}
Upvotes: 1
Reputation: 81713
I recommend the function rbind.fill
of the plyr
package:
library(plyr)
rbind.fill(Data1, Data2)
# Var1 Var2 Var3 Var4 Var5 Var6
#1 10 10 2 3 NA NA
#2 9 2 8 3 NA NA
#3 6 4 4 8 NA NA
#4 7 3 10 8 NA NA
#5 3 NA 6 NA 6 4
#6 1 NA 5 NA 2 1
#7 9 NA 2 NA 2 9
#8 2 NA 3 NA 6 2
The major advantage of this technique is that it's not limited to two data frames, but allows combining any number of data frames.
If the data still needs to be read from disk, you can do something like:
file_list = list.files()
data_list = lapply(file_list, read.table)
data_combined = do.call("rbind.fill", data_list)
Upvotes: 7
Reputation: 15415
merge(Data1, Data2, all=TRUE, sort=FALSE)
Var1 Var3 Var2 Var4 Var5 Var6
1 10 2 10 3 NA NA
2 9 8 2 3 NA NA
3 6 4 4 8 NA NA
4 7 10 3 8 NA NA
5 3 6 NA NA 6 4
6 1 5 NA NA 2 1
7 9 2 NA NA 2 9
8 2 3 NA NA 6 2
EDIT: A way to combine multiple frames As detailed here.
Data3
Var1 Var3 Var5 Var6
1 2 6 4 1
2 10 1 6 1
3 1 6 3 1
4 9 5 5 7
We'll need to put your data into a list and use a nice package called reshape
.
datalist <- list(Data1, Data2, Data3)
library(reshape)
merge_recurse(datalist)
Var1 Var3 Var2 Var4 Var5 Var6
1 10 2 10 3 NA NA
2 9 8 2 3 NA NA
3 6 4 4 8 NA NA
4 7 10 3 8 NA NA
5 3 6 NA NA 6 4
6 1 5 NA NA 2 1
7 9 2 NA NA 2 9
8 2 3 NA NA 6 2
9 2 6 NA NA 4 1
10 10 1 NA NA 6 1
11 1 6 NA NA 3 1
12 9 5 NA NA 5 7
Upvotes: 5
Reputation: 12723
Try this:
data1 <- as.data.frame(read.table("data1", header=TRUE, sep=" "))
data2 <- as.data.frame(read.table("data2", header=TRUE, sep=" "))
merge(data1, data2, all=TRUE, all.x=TRUE, all.Y=TRUE)
Upvotes: 0