How can I read a double-semicolon-separated .txt in r?

I have this problem but in r:

How can I read a double-semicolon-separated .csv with quoted values using pandas?

The solution there is to drop the additional columns generated. I'd like to know if there's a way to read the file separated by ;; without generating those addiotional columns.

Thanks!

Upvotes: 0

Views: 185

Answers (2)

ivan866
ivan866

Reputation: 582

It is usually recommended to properly clean your data before attempting to parse it, instead of cleaning it WHILE parsing, or worse, AFTER. Either use Notepad++ to Replace all ;; occurences or R itself, but do not delete the original files (also a rule of thumb - never delete sources of data).

my.text <- readLines('d:/tmp/readdelim-r.csv')
cleaned <- gsub(';;', ';', my.text)
writeLines(cleaned, 'd:/tmp/cleaned.csv')
my.cleaned <- read.delim('d:/tmp/cleaned.csv', header=FALSE, sep=';')

Upvotes: 0

r2evans
r2evans

Reputation: 160677

Read it in normally using read.csv2 (or whichever variant you prefer, including read.table, read.delim, readr::read_csv2, data.table::fread, etc), and then remove the even-numbered columns.

dat <- read.csv2(text = "a;;b;;c;;d\n1;;2;;3;;4")
dat
#   a  X b X.1 c X.2 d
# 1 1 NA 2  NA 3  NA 4

dat[,-seq(2, ncol(dat), by = 2)]
#   a b c d
# 1 1 2 3 4

Upvotes: 1

Related Questions