Reputation: 2001

Ambiguity while using readLines() in R

The first line of my dataset contains the name of the columns. It looks like this --

#"State Code","County Code","Site Num","Parameter Code","POC","Latitude","Longitude","Datum","Parameter Name","Sample Duration","Pollutant Standard","Metric Used","Method Name","Year","Units of Measure","Event Type","Observation Count","Observation Percent","Completeness Indicator","Valid Day Count","Required Day Count","Exceptional Data Count","Null Data Count","Primary Exceedance Count","Secondary Exceedance Count","Certification Indicator","Num Obs Below MDL","Arithmetic Mean","Arithmetic Standard Dev","1st Max Value","1st Max DateTime","2nd Max Value","2nd Max DateTime","3rd Max Value","3rd Max DateTime","4th Max Value","4th Max DateTime","1st Max Non Overlapping Value","1st NO Max DateTime","2nd Max Non Overlapping Value","2nd NO Max DateTime","99th Percentile","98th Percentile","95th Percentile","90th Percentile","75th Percentile","50th Percentile","10th Percentile","Local Site Name","Address","State Name","County Name","City Name","CBSA Name","Date of Last Change"

It is a csv file. Since I am using windows I wrote

pm0 <-read.csv("C:/Users/Ad/Desktop/EDA/2010.csv",
                comment.char="#", header=FALSE, sep=",", na.strings="")

to read this csv file except the first line. Now I want to read the first line so that I can use the first line to set the column names of my generated data frame.For this I wrote--

cnames<-readLines("C:/Users/Ad/Desktop/EDA/2010.csv",1)

But when I print cnames I get this --

[1] "\"State Code\",\"County Code\",\"Site Num\",\"Parameter Code\",\"POC\",\"Latitude\",\"Longitude\",\"Datum\",\"Parameter Name\",\"Sample Duration\",\"Pollutant Standard\",\"Metric Used\",\"Method Name\",\"Year\",\"Units of Measure\",\"Event Type\",\"Observation Count\",\"Observation Percent\".

I dont understand why \ is coming at start and end of every element of cnames.

Can someone help me to remove this.

Upvotes: 0

Answers (2)

asad_hussain

Reputation: 2001

What i did is this --

pm0<-read.csv("C:/Users/Ad/Desktop/EDA/2010.csv",comment.char="#",header=TRUE,sep=",",na.strings="")

Now the object pm0 contains the first row of csv file as the column names.

Upvotes: 0

Sun Bee

Reputation: 1820

This is from the Exploratory Data Analysis (EDA) assignment on Coursera, right? I trust you are compliant with the honor code.

What you have in 'cnames' is ONE string enclosed in double-quotes within which the backslash operator has escaped other quotation marks.

To get around this, try:

cnames1 <- strsplit(cnames, ",")
gsub("[\"]", "", cnames1[[1]], perl=TRUE)

This gives an array of names.

[1] "State Code"          "County Code"         "Site Num"           
 [4] "Parameter Code"      "POC"                 "Latitude"           
 [7] "Longitude"           "Datum"               "Parameter Name"     
[10] "Sample Duration"     "Pollutant Standard"  "Metric Used"        
[13] "Method Name"         "Year"                "Units of Measure"   
[16] "Event Type"          "Observation Count"   "Observation Percent"

Upvotes: 1

Ambiguity while using readLines() in R

Answers (2)

Related Questions