Rechlay
Rechlay

Reputation: 1517

Collecting data in one row from different csv files by the name

It's hard to explain what exactly I want to achieve with my script but let me try.

I have 20 different csv files, so I loaded them into R:

tbl = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)

then with your help I combined them into one and removed all of the duplicates:

data_rd <- subset(transform(all_data, X = sub("\\..*", "", X)), 
       !duplicated(X))

I have now 1 master table which includes all of the names (Accession):

Accession

AT1G19570
AT5G38480
AT1G07370
AT4G23670
AT5G10450
AT4G09000
AT1G22300
AT1G16080
AT1G78300
AT2G29570

Now I would like to find this accession in other csv files and put the data of this accession in the same raw. There are like 20 csv files and for each csv there are like 20 columns so in same cases it might give me a 400 columns. It doesn't matter how long it takes. It has to be done. Is it even possible to do with R ?

Example:

                 First csv                  Second csv           Third csv

Accession    Size Lenght Weight         Size Lenght Weight    Size Lenght Weight 



AT1G19570     12   23     43             22     77   666      656     565   33
AT5G38480
AT1G07370     33   22     33             34     22
AT4G23670
AT5G10450
AT4G09000     12   45     32
AT1G22300
AT1G16080
AT1G78300                                 44    22  222
AT2G29570

It looks like a hard task to do. Propably it has to be done by the loop. Any ideas ?

Upvotes: 1

Views: 86

Answers (1)

Stephen Henderson
Stephen Henderson

Reputation: 6522

This is a merge loop. Here is rough R code that will inefficiently grow with each merge. Begin as before:

tbls = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)
tbl=list_of_data[[1]]

for(i in 2:length(list_of_data))
{
   tbl=merge(tbl, list of_data[[i]], by="Accession", all=T)
}

The matching column names (not used as a key) will be renamed with a suffix (.x,.y, and so on), the all=T argument will ensure that whenever a new Accession key is merged a new row will be made and the missing cells will be filled with NA.

Upvotes: 2

Related Questions