Danny
Danny

Reputation: 488

Can I import a csv file with non comma separator?

Using the import function from the rio package, I am trying to import a csv file with a separator that is not a comma, but I can't get it to import correctly.

Example:

df <- data.frame(X1 = 1:5, X2 = 101:105, X3 = LETTERS[1:5])
write.table(df, file = "tabbed_file.csv", sep = "\t", row.names = FALSE)
rio::import("tabbed_file.csv")

But this imports as a data frame with a single column.

  X1"\t"X2"\t"X3
1  1"\t1\t101\t"A
2  2"\t2\t102\t"B
3  3"\t3\t103\t"C
4  4"\t4\t104\t"D
5  5"\t5\t105\t"E

I also tried:

rio::import("tabbed_file.csv", sep = "\t")

but that gives an error:

Error in import_delim(file = formal argument "sep" matched by multiple actual arguments

Which I think I understand by looking at the code for the import method on GitHub, because the sep parameter is hard coded in the method:

.import.rio_csv <- function(file, which = 1, ...){
    import_delim(file = file, sep = ",", ...)
}

The README file for rio states that:

rio uses uses data.table::fread() for text-delimited files to automatically determine the file format regardless of the extension. So, a CSV that is actually tab-separated will still be correctly imported.

So I must be missing something but I don't understand why. I know I could just use fread directly from the data.table package but I'd like to understand why I'm having this problem because I know it should be possible.

Upvotes: 1

Views: 800

Answers (2)

Danny
Danny

Reputation: 488

So this was a bug in the rio package and has now been addressed by the developers. The issue I opened on their GitHub account links to the changes that have been made to the code to fix this.

The example in the question will import the data correctly as of version 0.4.7 of Rio.

Upvotes: 0

Konstantinos
Konstantinos

Reputation: 4366

First, library(data.table); df <- fread('tabbed_file.csv') works just fine.

Second, you "should" either save the data.frame with the "tab separated" extension .tsv or define its format yourself. Documentation is also helpful.

df <- data.frame(X1 = 1:5, X2 = 101:105, X3 = LETTERS[1:5])
write.table(df, file = "tabbed_file.tsv", sep = "\t", row.names = FALSE)
rio::import("tabbed_file.tsv")
#   X1  X2 X3
# 1  1 101  A
# 2  2 102  B
# 3  3 103  C
# 4  4 104  D
# 5  5 105  E
rio::import("tabbed_file.csv", format = "tsv")
#   X1  X2 X3
# 1  1 101  A
# 2  2 102  B
# 3  3 103  C
# 4  4 104  D
# 5  5 105  E

Upvotes: 1

Related Questions