Making a list/vector of one column from different csv

Question

I already loaded 20 csv files with function:

tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))

That how it looks like:

> head(tbl)
[1] "F1.csv"          "F10_noS3.csv"    "F11.csv"         "F12.csv"         "F12_noS7_S8.csv"
[6] "F13.csv"

In all of those csv files is a column called "Accession". I would like to make a list of all "names" inside those columns from each csv file. One big list.

Two problems:

Some of those "names" are the same and I don't want to duplicate them
Some of those "names" are ALMOST the same. The difference is that there is name and after become the dot and the numer.

Let me show you how it looks:

AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--

<-- = Same sample, different names. Should be treated as one. So just ignore dot and a number after.

Is it possible to do ?

I couldn't do a dput(head) because it's even too big data set.

Richie Cotton · Accepted Answer

The first trick: you can read all the tables into a list of data frames using lapply. This is easier to work with than 20 individual data frames.

tbl = list.files(pattern="*.csv")
list_of_data = lapply(tbl, read.csv)

The second trick: you can recombine this list into a single data frame using do.call in conjunction with rbind.

all_data = do.call(rbind, list_of_data)

You can select the contents of the Accession field before the dot using regular expressions. The stringr package is useful here. ^ represents the start of the string, [[:alnum:]] represents a letter or number (an alphanumeric character), and + means one or more.

library(stringr)
all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")

Finally, you can remove duplicates by subsetting on non-duplicated values.

all_data = subset(all_data, !duplicated(CleanedAccession))

Making a list/vector of one column from different csv

Answers (2)

Related Questions