Beta
Beta

Reputation: 1746

Appending Dataset in R

I have 2 datasets:

Data1:

Var1 Var2   Var3    Var4
10    10      2   3
9      2      8   3
6      4      4   8
7      3     10   8

Data2:

Var1 Var5   Var3    Var6
  3    6      6   4
  1    2      5   1
  9    2      2   9
  2    6      3   2

Now I want to append this 2 datasets

Final Data:

Var1  Var2    Var3  Var4  Var5 Var6
10      10       2     3        
9        2       8     3        
6        4       4     8        
7        3      10     8        
3                      4     6    6
1                      1     2    5
9                      9     2    2
2                      2     6    3

I can't use rbind to create this dataset. Can anybody please tell me the method to create this dataset? Also, suppose I want to append multiple (more than 2) datasets. What's the procedure?

Upvotes: 1

Views: 5424

Answers (4)

Sathish
Sathish

Reputation: 12723

# Open a new directory and keep only the data files to be combined
combinedfiles <- function(){
  # nullVar: Creating a Null Variable using as.null function
    nullVar <- function(x){ 
    x <- getwd(); 
    x <- as.null(x); 
    }

  # readTab: Read file using read.table function
    readTab <- function(y) { 
    read.table(y, header=TRUE, sep = " ") 
    }

    objectcontent <- nullVar(x);    

    for (i in 1:length(list.files(getwd()))) {
    y <- list.files(getwd())[i];
    objectcontent <- rbind(objectcontent, readTab(y));
    i <- i + 1
    }
  return(objectcontent)
}

#Then type the following in the console
  combinedfiles()

a version using apply loops (which do not suffer from the rbind slowdown):

combined_files = function(file_path, extension = "csv") {
   require(plyr)
   file_list = list.files(file_path, pattern = extension)
   data_list = lapply(file_list, read.table, header = TRUE, sep = " ")
   combined_data = do.call("rbind.fill", data_list)
   return(combined_data)
 }

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81713

I recommend the function rbind.fill of the plyr package:

library(plyr)
rbind.fill(Data1, Data2)

#  Var1 Var2 Var3 Var4 Var5 Var6
#1   10   10    2    3   NA   NA
#2    9    2    8    3   NA   NA
#3    6    4    4    8   NA   NA
#4    7    3   10    8   NA   NA
#5    3   NA    6   NA    6    4
#6    1   NA    5   NA    2    1
#7    9   NA    2   NA    2    9
#8    2   NA    3   NA    6    2

The major advantage of this technique is that it's not limited to two data frames, but allows combining any number of data frames.

If the data still needs to be read from disk, you can do something like:

file_list = list.files()
data_list = lapply(file_list, read.table)
data_combined = do.call("rbind.fill", data_list)

Upvotes: 7

sebastian-c
sebastian-c

Reputation: 15415

merge(Data1, Data2, all=TRUE, sort=FALSE)

  Var1 Var3 Var2 Var4 Var5 Var6
1   10    2   10    3   NA   NA
2    9    8    2    3   NA   NA
3    6    4    4    8   NA   NA
4    7   10    3    8   NA   NA
5    3    6   NA   NA    6    4
6    1    5   NA   NA    2    1
7    9    2   NA   NA    2    9
8    2    3   NA   NA    6    2

EDIT: A way to combine multiple frames As detailed here.

Combining more than 2 frames

Data3

  Var1 Var3 Var5 Var6
1    2    6    4    1
2   10    1    6    1
3    1    6    3    1
4    9    5    5    7

We'll need to put your data into a list and use a nice package called reshape.

datalist <- list(Data1, Data2, Data3)
library(reshape)

merge_recurse(datalist)
   Var1 Var3 Var2 Var4 Var5 Var6
1    10    2   10    3   NA   NA
2     9    8    2    3   NA   NA
3     6    4    4    8   NA   NA
4     7   10    3    8   NA   NA
5     3    6   NA   NA    6    4
6     1    5   NA   NA    2    1
7     9    2   NA   NA    2    9
8     2    3   NA   NA    6    2
9     2    6   NA   NA    4    1
10   10    1   NA   NA    6    1
11    1    6   NA   NA    3    1
12    9    5   NA   NA    5    7

Upvotes: 5

Sathish
Sathish

Reputation: 12723

Try this:

data1 <- as.data.frame(read.table("data1", header=TRUE, sep=" "))
data2 <- as.data.frame(read.table("data2", header=TRUE, sep=" "))
merge(data1, data2, all=TRUE, all.x=TRUE, all.Y=TRUE)

Upvotes: 0

Related Questions