combining text files in R with different separators

Question

I am trying to read in and combine multiple text files into R. The issue with this is that I have been given some data where field separators between files are different (e.g. tab for one and commas for another). How could I combine these efficiently? An example of layout:

Data1 (tab):

v1  v2  v3  v4  v5
 1   2   3   4   urban
 4   5   3   2   city

Data2 (comma):

v1,v2,v3,v4,v5
5,6,7,8,rural
6,4,3,1,city

This example is obviously not real, the real code has nearly half a million points! And so cannot reshape the original files. The code I have used so far has been:

filelist <- list.files(path = "~/Documents/", pattern='.dat', full.names=T)
data1 <- ldply(filelist, function(x) read.csv(x, sep="	"))
data2 <- ldply(filelist, function(x) read.csv(x, sep=","))

This gives me the the data both ways, which I then need to manually clean and then combine. Is there a way of using sep in a way that can remove this? Column names are the same among files. I know that stringr or other concatenating functions may be useful, but I also need to load the data in at the same time, and am unsure how to set this up within the read commands.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

I would suggest using fread from the "data.table" package. It's fast, and does a pretty good job of automatically detecting a delimiter in a file.

Here's an example:

## Create some example files
cat('v1	v2	v3	v4	v5
1	2	3	4	urban
4	5	3	2	city
', file = "file1.dat")
cat('v1,v2,v3,v4,v5
5,6,7,8,rural
6,4,3,1,city
', file = "file2.dat")

## Get a character vector of the file names
files <- list.files(pattern = "*.dat") ## Use what you're already doing

library(data.table)
lapply(files, fread)
# [[1]]
#    v1 v2 v3 v4    v5
# 1:  1  2  3  4 urban
# 2:  4  5  3  2  city
# 
# [[2]]
#    v1 v2 v3 v4    v5
# 1:  5  6  7  8 rural
# 2:  6  4  3  1  city

## Fancy work: Bind it all to one data.table...
##   with a column indicating where the file came from....
rbindlist(setNames(lapply(files, fread), files), idcol = TRUE)
#          .id v1 v2 v3 v4    v5
# 1: file1.dat  1  2  3  4 urban
# 2: file1.dat  4  5  3  2  city
# 3: file2.dat  5  6  7  8 rural
# 4: file2.dat  6  4  3  1  city

combining text files in R with different separators

Answers (2)

Related Questions