josetribiani
josetribiani

Reputation: 71

How to read a CSV file into R which uses two types of separators in the file?

I am trying to read a CSV file into R which makes use of two different separators: the "," and the ";". Below is an short example of the CSV format:

"car_brand; car_model","total"
"Toyota; 9289","29781"
"Seat; 20981","1610"
"Volkswagen; 11140","904"
"Suzuki; 11640","658"
"Renault; 13075","647"
"Ford; 15855","553"

The CSV file should contain 3 columns, car_brand, car_model, and total. However, car_brand and car_model are separated by a ";" rather than a ",". Any guidance on how to import such a file would be really appreciated.

Upvotes: 0

Views: 1075

Answers (4)

user438383
user438383

Reputation: 6206

One option would be to use a combination of fread and gsub:

library(data.table)
fread(gsub(";", "", '"car_brand; car_model","total"
"Toyota; 9289","29781"
"Seat; 20981","1610"
"Volkswagen; 11140","904"
"Suzuki; 11640","658"
"Renault; 13075","647"
"Ford; 15855","553"
'))
   car_brand car_model total
1:         Toyota 9289 29781
2:          Seat 20981  1610
3:    Volkswagen 11140   904
4:        Suzuki 11640   658
5:       Renault 13075   647
6:          Ford 15855   553

Upvotes: 1

Samet Sökel
Samet Sökel

Reputation: 2670

a tidyverse solution;

library(tidyverse)

read.csv('file.csv',header = T) %>%
separate(col='car_brand..car_model',into = c('car_brand','car_model'),sep = ';') %>%
 mutate(car_model=as.numeric(car_model)) 

output;

car_brand  car_model total
  <chr>          <dbl> <int>
1 Toyota          9289 29781
2 Seat           20981  1610
3 Volkswagen     11140   904
4 Suzuki         11640   658
5 Renault        13075   647
6 Ford           15855   553

Upvotes: 1

r2evans
r2evans

Reputation: 160417

A double-tap:

x1 <- read.csv("quux.csv", check.names = FALSE)
x2 <- read.csv2(text = x1[[1]], header = FALSE)
names(x2) <- unlist(read.csv2(text = names(x1)[1], header = FALSE))
cbind(x2, x1[,-1,drop=FALSE])
#    car_brand  car_model total
# 1     Toyota       9289 29781
# 2       Seat      20981  1610
# 3 Volkswagen      11140   904
# 4     Suzuki      11640   658
# 5    Renault      13075   647
# 6       Ford      15855   553

The use of check.names=FALSE is required because otherwise names(x1)[1] looks like "car_brand..car_model". While it can be parsed like this, I thought it better to parse the original text.

Upvotes: 3

Vladimir Antonyan
Vladimir Antonyan

Reputation: 1

If you write the csvImporter yourself, you simply have to change the separator dynamically (depending on the index) in the loop.

Upvotes: 0

Related Questions