Reputation: 488
Using the import
function from the rio
package, I am trying to import a csv
file with a separator that is not a comma
, but I can't get it to import correctly.
Example:
df <- data.frame(X1 = 1:5, X2 = 101:105, X3 = LETTERS[1:5])
write.table(df, file = "tabbed_file.csv", sep = "\t", row.names = FALSE)
rio::import("tabbed_file.csv")
But this imports as a data frame with a single column.
X1"\t"X2"\t"X3
1 1"\t1\t101\t"A
2 2"\t2\t102\t"B
3 3"\t3\t103\t"C
4 4"\t4\t104\t"D
5 5"\t5\t105\t"E
I also tried:
rio::import("tabbed_file.csv", sep = "\t")
but that gives an error:
Error in import_delim(file = formal argument "sep" matched by multiple actual arguments
Which I think I understand by looking at the code for the import method on GitHub, because the sep parameter is hard coded in the method:
.import.rio_csv <- function(file, which = 1, ...){
import_delim(file = file, sep = ",", ...)
}
The README file for rio
states that:
rio uses uses data.table::fread() for text-delimited files to automatically determine the file format regardless of the extension. So, a CSV that is actually tab-separated will still be correctly imported.
So I must be missing something but I don't understand why. I know I could just use fread directly from the data.table
package but I'd like to understand why I'm having this problem because I know it should be possible.
Upvotes: 1
Views: 800
Reputation: 488
So this was a bug in the rio
package and has now been addressed by the developers. The issue I opened on their GitHub account links to the changes that have been made to the code to fix this.
The example in the question will import the data correctly as of version 0.4.7
of Rio
.
Upvotes: 0
Reputation: 4366
First, library(data.table); df <- fread('tabbed_file.csv')
works just fine.
Second, you "should" either save the data.frame with the "tab separated" extension .tsv
or define its format yourself.
Documentation is also helpful.
df <- data.frame(X1 = 1:5, X2 = 101:105, X3 = LETTERS[1:5])
write.table(df, file = "tabbed_file.tsv", sep = "\t", row.names = FALSE)
rio::import("tabbed_file.tsv")
# X1 X2 X3
# 1 1 101 A
# 2 2 102 B
# 3 3 103 C
# 4 4 104 D
# 5 5 105 E
rio::import("tabbed_file.csv", format = "tsv")
# X1 X2 X3
# 1 1 101 A
# 2 2 102 B
# 3 3 103 C
# 4 4 104 D
# 5 5 105 E
Upvotes: 1