Combine some csv files into one - different number of columns

Question

I already loaded 20 csv files with function:

tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))

or

list_of_data = lapply(tbl, read.csv)

That how it looks like:

> head(tbl)
[1] "F1.csv"          "F10_noS3.csv"    "F11.csv"         "F12.csv"         "F12_noS7_S8.csv"
[6] "F13.csv"

I have to combine all of those files into one. Let's call it a master file but let's try with making a one table with all of the names. In all of those csv files is a column called "Accession". I would like to make a table of all "names" from all of those csv files. Of course many of the accessions can be repeated in different csv files. I would like to keep all of the data corresponding to the accession.

Some problems:

Some of those "names" are the same and I don't want to duplicate them
Some of those "names" are ALMOST the same. The difference is that there is name and after become the dot and the numer.
The number of columns can be different is those csv files.

That's the screenshot showing how those data looks like: http://imageshack.com/a/img811/7103/29hg.jpg

Let me show you how it looks:

AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--

<-- = Same sample, different names. Should be treated as one. So just ignore dot and a number after.

Is it possible to do ?

I couldn't do a dput(head) because it's even too big data set.

I tried to use such code:

all_data = do.call(rbind, list_of_data)
Error in rbind(deparse.level, ...) : 
The number of columns is not correct.


all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))

I tried to do it for almost 2 weeks and I am not able to. So please help me.

nassimhddd · Accepted Answer

Your questions seems to contain multiple subquestions. I encourage you to separate them.

The first thing you apparently need is to combine data frames with different columns. You can use rbind.fill from the plyr package:

library(plyr)
all_data = do.call(rbind.fill, list_of_data)

Combine some csv files into one - different number of columns

Answers (2)

Related Questions