Reputation: 3
I have this problem but in r:
How can I read a double-semicolon-separated .csv with quoted values using pandas?
The solution there is to drop the additional columns generated. I'd like to know if there's a way to read the file separated by ;; without generating those addiotional columns.
Thanks!
Upvotes: 0
Views: 185
Reputation: 582
It is usually recommended to properly clean your data before attempting to parse it, instead of cleaning it WHILE parsing, or worse, AFTER. Either use Notepad++ to Replace all ;;
occurences or R itself, but do not delete the original files (also a rule of thumb - never delete sources of data).
my.text <- readLines('d:/tmp/readdelim-r.csv')
cleaned <- gsub(';;', ';', my.text)
writeLines(cleaned, 'd:/tmp/cleaned.csv')
my.cleaned <- read.delim('d:/tmp/cleaned.csv', header=FALSE, sep=';')
Upvotes: 0
Reputation: 160677
Read it in normally using read.csv2
(or whichever variant you prefer, including read.table
, read.delim
, readr::read_csv2
, data.table::fread
, etc), and then remove the even-numbered columns.
dat <- read.csv2(text = "a;;b;;c;;d\n1;;2;;3;;4")
dat
# a X b X.1 c X.2 d
# 1 1 NA 2 NA 3 NA 4
dat[,-seq(2, ncol(dat), by = 2)]
# a b c d
# 1 1 2 3 4
Upvotes: 1