user9092800
user9092800

Reputation:

remove doublequote in read csv in r

data contain double quotes, is there any effective way to remove

Original german credit .csv data set given

GermanCredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = F, header = T, sep = "," , quote = "")

resulting as follow:

read.csv with quote argument

avoid to specify the quote argument

germancredit <- read.csv("D:/R Statistics/GermanCredit/germancredit.csv", stringsAsFactors = F, header = T, sep = ",")

result the following:

read.csv without quote argument

tried to use read.table

German_Credit <- read.table("D:/R Statistics/GermanCredit/germancredit.csv", quote = NULL, header = T, sep = ",")

try fread function from readr

dput(readLines("D:/R Statistics/GermanCredit/germancredit.csv", n = 10))

c(""""status"",""duration"",""credit_history"",""purpose"",""amount"",""savings"",""employment_duration"",""installment_rate"",""personal_status_sex"",""other_debtors"",""present_residence"",""property"",""age"",""other_installment_plans"",""housing"",""number_credits"",""job"",""people_liable"",""telephone"",""foreign_worker"",""credit_risk"""", """"... < 100 DM"",6,""critical account/other credits existing"",""domestic appliances"",1169,""unknown/no savings account"",""... >= 7 years"",4,""male : single"",""none"",4,""real estate"",67,""none"",""own"",2,""skilled employee/official"",1,""yes"",""yes"",1"", """"0 <= ... < 200 DM"",48,""existing credits paid back duly till now"",""domestic appliances"",5951,""... < 100 DM"",""1 <= ... < 4 years"",2,""female : divorced/separated/married"",""none"",2,""real estate"",22,""none"",""own"",1,""skilled employee/official"",1,""no"",""yes"",0"", """"no checking account"",12,""critical account/other credits existing"",""retraining"",2096,""... < 100 DM"",""4 <= ... < 7 years"",2,""male : single"",""none"",3,""real estate"",49,""none"",""own"",1,""unskilled - resident"",2,""no"",""yes"",1"", """"... < 100 DM"",42,""existing credits paid back duly till now"",""radio/television"",7882,""... < 100 DM"",""4 <= ... < 7 years"",2,""male : single"",""guarantor"",4,""building society savings agreement/life insurance"",45,""none"",""for free"",1,""skilled employee/official"",2,""no"",""yes"",1"", """"... < 100 DM"",24,""delay in paying off in the past"",""car (new)"",4870,""... < 100 DM"",""1 <= ... < 4 years"",3,""male : single"",""none"",4,""unknown/no property"",53,""none"",""for free"",2,""skilled employee/official"",2,""no"",""yes"",0"", """"no checking account"",36,""existing credits paid back duly till now"",""retraining"",9055,""unknown/no savings account"",""1 <= ... < 4 years"",2,""male : single"",""none"",4,""unknown/no property"",35,""none"",""for free"",1,""unskilled - resident"",2,""yes"",""yes"",1"", """"no checking account"",24,""existing credits paid back duly till now"",""radio/television"",2835,""500 <= ... < 1000 DM"",""... >= 7 years"",3,""male : single"",""none"",4,""building society savings agreement/life insurance"",53,""none"",""own"",1,""skilled employee/official"",1,""no"",""yes"",1"", """"0 <= ... < 200 DM"",36,""existing credits paid back duly till now"",""car (used)"",6948,""... < 100 DM"",""1 <= ... < 4 years"",2,""male : single"",""none"",2,""car or other"",35,""none"",""rent"",1,""management/self-employed/highly qualified employee/officer"",1,""yes"",""yes"",1"", """"no checking account"",12,""existing credits paid back duly till now"",""domestic appliances"",3059,""... >= 1000 DM"",""4 <= ... < 7 years"",2,""male : divorced/separated"",""none"",4,""real estate"",61,""none"",""own"",1,""unskilled - resident"",1,""no"",""yes"",1"" )

Upvotes: 1

Views: 4482

Answers (1)

Jan van der Laan
Jan van der Laan

Reputation: 8105

There are two strange things in your file

  • The file uses double double quotes ""
  • The lines in you file are also quoted

"""a"",1" """b"",2"

This could be because your file was a csv file that was wrongly read (e.g. by using the wrong type of separators such as ';') which was then written away as a csv file.

First removing the outer quotes and then using double double quotes as quotes (as suggested by @ytu) seems to work:

lines <- readLines("<yourfile>") lines <- gsub('(^"|"$)', "", lines) read.csv(textConnection(lines), quote = '""')

Upvotes: 1

Related Questions