Reputation: 149
I have a semi-colon separated file which I want to read. The data in the file is given below. In the 4th row , I want to be able to read only 4 columns.
But I'm failing to do that in R.
ID;Comment;Date;Amt
1;Hello;5-06-2003;85.13
2;World;5-06-2013;127.39
3;Airlines;5-06-1999;148.34
4;"Air"l;ine"s";5-09-2013;87.94
data<-read.table(fileName,header=T,sep = ";",quote="\"",na.strings = c("" , ".", "-", "NA" ));
The above code does not work. Can anyone help ?
Upvotes: 3
Views: 599
Reputation: 235
fread
from the data.table
package, which can handle such "exceptions" quite nicely, would be one way to solve this.
data.table::fread("file.txt")
ID Comment Date Amt
1: 1 Hello 5-06-2003 85.13
2: 2 World 5-06-2013 127.39
3: 3 Airlines 5-06-1999 148.34
4: 4 Air"l;ine"s 5-09-2013 87.94
Upvotes: 6
Reputation: 20811
Another way is to use some delicious regex
path <- tempfile()
writeLines('ID;Comment;Date;Amt
1;Hello;5-06-2003;85.13
2;World;5-06-2013;127.39
3;Airlines;5-06-1999;148.34
4;"Air"l;ine"s";5-09-2013;87.94', path)
(rl <- scan(path, what = ''))
read.table(text = gsub('^(\\w+);(.*?);(Date|[-0-9]+);(Amt|[0-9.]+)$',
'\\1 \\2 \\3 \\4', rl),
quote = '', header = TRUE, stringsAsFactors = FALSE)
# ID Comment Date Amt
# 1 1 Hello 5-06-2003 85.13
# 2 2 World 5-06-2013 127.39
# 3 3 Airlines 5-06-1999 148.34
# 4 4 "Air"l;ine"s" 5-09-2013 87.94
And a simplified version gives the same thing
read.table(text = gsub('^(.*?);(.*);(.*?);(.*?)$',
'\\1 \\2 \\3 \\4', rl),
quote = '', header = TRUE, stringsAsFactors = FALSE)
Upvotes: 3