Reputation: 1360
I am encountering an issue while loading a CSV data set in R. The data set can be taken from
https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53
I imported the data using read.csv
as below and the dataset was imported correctly.
EmpSal <- read.csv('E:/Data/EmpSalaries.csv')
I tried reading the data using read.table
and there were a lot of anomalies when looking at the dataset.
EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)
The above code started reading the data from 7th row and the dataset actually contains ~14K rows but only 5K rows were imported. When looked at the dataset in few cases 15-20 rows were combined into a single row and the entire row data appeared in a single column.
I can work on the dataset using read.csv
but I am curious to know the reason why it didn't work with read.table.
Upvotes: 4
Views: 43880
Reputation: 113
As you mentioned, your data is imported successfully by using read.csv()
command without mentioning quote argument.
Default value of quote argument for read.csv function is "\""
and for read.table function, it is "\"'"
.
Check following code,
read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
row.names, col.names, as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.csv(file, header = TRUE, sep = ",", quote = "\"",
dec = ".", fill = TRUE, comment.char = "", ...)
There are many single quotation in your specified data. And this is the reason why read.table function isn't working for you.
Try the following code and it will work for you.
r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)
Upvotes: 2
Reputation: 6449
read.csv is defined as:
function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
You need to add quote="\""
(read.table
expects single quotes by default whereas read.csv
expects double quotes)
EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE
Upvotes: 3