mockash
mockash

Reputation: 1360

Error in reading a CSV file with read.table()

I am encountering an issue while loading a CSV data set in R. The data set can be taken from

https://data.baltimorecity.gov/City-Government/Baltimore-City-Employee-Salaries-FY2015/nsfe-bg53

I imported the data using read.csv as below and the dataset was imported correctly.

EmpSal <- read.csv('E:/Data/EmpSalaries.csv')

I tried reading the data using read.table and there were a lot of anomalies when looking at the dataset.

EmpSal1 <- read.table('E:/Data/EmpSalaries.csv',sep=',',header = T,fill = T)

The above code started reading the data from 7th row and the dataset actually contains ~14K rows but only 5K rows were imported. When looked at the dataset in few cases 15-20 rows were combined into a single row and the entire row data appeared in a single column.

I can work on the dataset using read.csv but I am curious to know the reason why it didn't work with read.table.

Upvotes: 4

Views: 43880

Answers (2)

itsyub
itsyub

Reputation: 113

As you mentioned, your data is imported successfully by using read.csv() command without mentioning quote argument. Default value of quote argument for read.csv function is "\"" and for read.table function, it is "\"'". Check following code,

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

There are many single quotation in your specified data. And this is the reason why read.table function isn't working for you.

Try the following code and it will work for you.

 r<-read.table('/home/workspace/Downloads/Baltimore_City_Employee_Salaries_FY2015.csv',sep=",",quote="\"",header=T,fill=T)

Upvotes: 2

lebatsnok
lebatsnok

Reputation: 6449

read.csv is defined as:

function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", ...) 
read.table(file = file, header = header, sep = sep, quote = quote, 
    dec = dec, fill = fill, comment.char = comment.char, ...)

You need to add quote="\"" (read.table expects single quotes by default whereas read.csv expects double quotes)

EmpSal <- read.csv('Baltimore_City_Employee_Salaries_FY2015.csv')
EmpSal1 <- read.table('Baltimore_City_Employee_Salaries_FY2015.csv', sep=',', header = TRUE, fill = TRUE, quote="\"")
identical(EmpSal, EmpSal1)
# TRUE

Upvotes: 3

Related Questions