G_1991
G_1991

Reputation: 149

How to read a semi-colon separated file with double quotes in the columns?

I have a semi-colon separated file which I want to read. The data in the file is given below. In the 4th row , I want to be able to read only 4 columns.

But I'm failing to do that in R.

ID;Comment;Date;Amt
1;Hello;5-06-2003;85.13
2;World;5-06-2013;127.39
3;Airlines;5-06-1999;148.34
4;"Air"l;ine"s";5-09-2013;87.94

data<-read.table(fileName,header=T,sep = ";",quote="\"",na.strings = c("" , ".", "-", "NA" ));

The above code does not work. Can anyone help ?

Upvotes: 3

Views: 599

Answers (2)

plastikdusche
plastikdusche

Reputation: 235

fread from the data.table package, which can handle such "exceptions" quite nicely, would be one way to solve this.

data.table::fread("file.txt")
   ID     Comment      Date    Amt
1:  1       Hello 5-06-2003  85.13
2:  2       World 5-06-2013 127.39
3:  3    Airlines 5-06-1999 148.34
4:  4 Air"l;ine"s 5-09-2013  87.94

Upvotes: 6

rawr
rawr

Reputation: 20811

Another way is to use some delicious regex

path <- tempfile()
writeLines('ID;Comment;Date;Amt
1;Hello;5-06-2003;85.13
2;World;5-06-2013;127.39
3;Airlines;5-06-1999;148.34
4;"Air"l;ine"s";5-09-2013;87.94', path)


(rl <- scan(path, what = ''))

read.table(text = gsub('^(\\w+);(.*?);(Date|[-0-9]+);(Amt|[0-9.]+)$',
                       '\\1 \\2 \\3 \\4', rl),
           quote = '', header = TRUE, stringsAsFactors = FALSE)

#   ID       Comment      Date    Amt
# 1  1         Hello 5-06-2003  85.13
# 2  2         World 5-06-2013 127.39
# 3  3      Airlines 5-06-1999 148.34
# 4  4 "Air"l;ine"s" 5-09-2013  87.94

And a simplified version gives the same thing

read.table(text = gsub('^(.*?);(.*);(.*?);(.*?)$',
                       '\\1 \\2 \\3 \\4', rl),
           quote = '', header = TRUE, stringsAsFactors = FALSE)

Upvotes: 3

Related Questions