PaulaF
PaulaF

Reputation: 393

Merge several files of different length in R based on one column

I have 100 files and each of them looks like this:

ID     BYr       Milk         REL
183601 2010 -0.635262171151035 50
183603 2010 -1.15906865500681 50
183611 2010 -0.39135273818727 50
183616 2010 0.832853286113099 50
183619 2010 1.15141619232805 50

The column 1 (ID) refers to the animal ID and all files have this column. And the third column is the trait of interest. In this case, milk production or lactation length, etc. I want to merge all files based on the variable ID, exclude the columns 2 and 4 and generate a file with all IDs and each trait in a column. Something like this:

ID       Milk             LactLength    OP
183601 -0.635262171151035   350          2
183603 -1.15906865500681    250          4
183611 -0.39135273818727    450          5
183616 0.832853286113099    180          6
183619 1.15141619232805     250          7
183623 2.23473028006734     245          8

I have tried this from someone's answer:

myfiles = list.files(pattern = "\\.txt$")
datlist <- lapply(myfiles,read.table, header = TRUE, stringsAsFactors = FALSE, colClasses=c("character", "NULL"))
rowseq <- seq_len( max(vapply(datlist,nrow, integer(1))) )
keylist <- lapply(datlist,function(x) { x[[3]][rowseq] })
names(keylist) <- myfiles
df = do.call(data.frame,keylist)

But I am not getting how I can merge the files based on ID. Any help please? Thanks.

Upvotes: 2

Views: 296

Answers (1)

Ansjovis86
Ansjovis86

Reputation: 1545

You can use join or merge functions in either 'plyr' or 'base' package in R. I however prefer to use 'dplyr' package which has some different types of ways to join dataframes, like left_join, inner_join etc. I guess in your case you could do a full_join and just omit the columns afterwards that you don't want to use, like this:

require('dplyr')
first = TRUE

for (file in list.files(pattern = "\\.txt$")){                   #loop over all txt files
    if (first == T){df <- read.table(file,header=T)[,c(-2,-4)];first=FALSE} #don't need to join first file
    else{full_join(df,read.table(file,header=T)[,c(-2,-4)])}}               #join the former and the current dataframes

This only works properly when columns you want to merge have the same name and data type.

Upvotes: 1

Related Questions