Sara
Sara

Reputation: 245

R read csv with comma in column

Update 2020-5-14

Working with a different but similar dataset from here, I found read_csv seems to work fine. I haven't tried it with the original data yet though.

Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.

Update 2020-5-12

I think my original question is not correct. Like mentioned in the comment, the data was quoted. Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right". Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced. If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.


Original question

I am trying to read the listings.csv from this dataset in kaggle into R. I downloaded the file and wrote the coderead.csv('listing.csv'). The first column, the column id, is supposed to be numeric. However, it shows:

listing$id[1:10]
 [1] 2015  2695  3176  3309  7071  9991  14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...

I think it is because there are values with commas in the second column. For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is Ole,Ole...: enter image description here

How can I read a csv file into R correctly when some values contain commas?

Upvotes: 0

Views: 2809

Answers (3)

Edward
Edward

Reputation: 19514

If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R. The read.csv function, which calls scan, can also omit unwanted columns using the colClasses argument. However, the fread function from the data.table package does this much more simply with the drop argument:

library(data.table)
listings <- fread("listings.csv", drop=2)

If you do need the information in that column, then other methods are needed (see other solutions).

Upvotes: 0

OldRider
OldRider

Reputation: 15

You could try this?

lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)

listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name

Upvotes: 0

Shan R
Shan R

Reputation: 541

Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,). First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator". Most common one other than comma is pipe symbol (|). In R, when you read_csv, specify the seperator as '|'.

Upvotes: 2

Related Questions