Reputation: 393
I have 100 files and each of them looks like this:
ID BYr Milk REL
183601 2010 -0.635262171151035 50
183603 2010 -1.15906865500681 50
183611 2010 -0.39135273818727 50
183616 2010 0.832853286113099 50
183619 2010 1.15141619232805 50
The column 1 (ID) refers to the animal ID and all files have this column. And the third column is the trait of interest. In this case, milk production or lactation length, etc. I want to merge all files based on the variable ID, exclude the columns 2 and 4 and generate a file with all IDs and each trait in a column. Something like this:
ID Milk LactLength OP
183601 -0.635262171151035 350 2
183603 -1.15906865500681 250 4
183611 -0.39135273818727 450 5
183616 0.832853286113099 180 6
183619 1.15141619232805 250 7
183623 2.23473028006734 245 8
I have tried this from someone's answer:
myfiles = list.files(pattern = "\\.txt$")
datlist <- lapply(myfiles,read.table, header = TRUE, stringsAsFactors = FALSE, colClasses=c("character", "NULL"))
rowseq <- seq_len( max(vapply(datlist,nrow, integer(1))) )
keylist <- lapply(datlist,function(x) { x[[3]][rowseq] })
names(keylist) <- myfiles
df = do.call(data.frame,keylist)
But I am not getting how I can merge the files based on ID. Any help please? Thanks.
Upvotes: 2
Views: 296
Reputation: 1545
You can use join or merge functions in either 'plyr' or 'base' package in R. I however prefer to use 'dplyr' package which has some different types of ways to join dataframes, like left_join, inner_join etc. I guess in your case you could do a full_join and just omit the columns afterwards that you don't want to use, like this:
require('dplyr')
first = TRUE
for (file in list.files(pattern = "\\.txt$")){ #loop over all txt files
if (first == T){df <- read.table(file,header=T)[,c(-2,-4)];first=FALSE} #don't need to join first file
else{full_join(df,read.table(file,header=T)[,c(-2,-4)])}} #join the former and the current dataframes
This only works properly when columns you want to merge have the same name and data type.
Upvotes: 1