Reputation: 274
I know this is a recurrent question, so I apologize in advance for cross-posting.
I am having trouble reading a model output .csv
file that looks more or less like this (the original file has about 14,000 rows, but the columns are as they look here):
time x y z w r s t
1980 1 0.8327 0.3402 0.2021 0 1.1729 0
1980 2 0.7886 0.3399 0.2019 0 2.3014 0
1980 3 0.7909 0.3396 0.2017 0 3.4319 0
1980 4 0.7846 0.3394 0.2016 0 4.5559 0
1980 5 0.8103 0.3392 0.2014 0 5.7053 0
1980 6 0.8207 0.339 0.2013 0 6.865 0
1980 7 0.8263 0.3388 0.2012 0 8.0301 0
1980 14 0.9112 10.3411 20.6821 3.1175 60.4644 3.1175
1980 15 0.9092 8.878 17.756 2.734 70.2517 5.8515
1980 16 0.9001 9.5232 19.0464 2.9655 80.6749 8.817
1980 17 1.0313 7.59 15.18 2.4332 89.2962 11.2502
1980 18 1.0333 6.8859 13.7718 2.266 97.2154 13.5162
For the command:
read.csv("df", header = TRUE, sep = ",", blank.lines.skip = FALSE)
I get the following error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
From what I've understood from answers to similar questions, a possible problem may be that the read.csv
command is not recognizing the zeros in the last column as values, so the program reads it as if the first row contained one fewer field than the number of columns, and hence uses the first column for the row names.
However, when I create a "fake" table with actual zeros, blanks, or "NA" in the same positions as shown in the example above, the program has no trouble recognizing them and reading the file.
e.g.
df <- data.frame(x=c(1,2,3,3,3,4,5,2,2,6,7,3,8,9,10))
df$y <- c(4,8,9,1,1,5,8,8,3,2,0,9,4,4,7)
df$z <- c(" "," "," ",4,5,6,7,8,9,10,11,12,13,14,15)
OR:
df$z <- c(0,0,0,4,5,6,7,8,9,10,11,12,13,14,15)
OR:
df$z <- c("NA","NA","NA",4,5,6,7,8,9,10,11,12,13,14,15)
Could anyone tell me why is this happening?
I have solved the issue as suggested by other users:
df <- read.csv("df.csv", header = TRUE, row.names = NULL)
colnames(df) <- c(colnames(df)[-1],NULL)
write.table(df, "df.csv", sep = ",", col.names = TRUE, row.names = FALSE)
And start working as normal from here.
It works just fine, but I was wondering if there is a more direct solution to this problem, or if there is something I am missing.
Thank you,
Upvotes: 1
Views: 21950
Reputation: 109
my 'row.names' problem was that i was simply using the wrong 'sep'
Upvotes: 0
Reputation: 76402
Here are two ways.
The first uses an external package, data.table
. Function fread
does the job with a warning. And the column names are messed up, since the first row has less fields than the other rows, fread
discards that row.
data.table::fread("test.csv", sep = ";")
# V1 V2 V3
#1: A 1 6
#2: A 2 7
#3: A 3 8
#4: A 4 9
#5: A 5 10
Warning message:
In data.table::fread("test2.csv", sep = ";") :
Starting data input on line 2 and discarding line 1 because it has too
few or too many items to be column names or data: Col1;Col2
The second way is more complicated. If you don't want to load an extra package, I have written a function that uses readLines
to read in the first row with the column names and then reads the rest of the file with read.table
.
myread <- function(file, sep = ",", ...){
nm <- readLines(file, n = 1)
nm <- unlist(strsplit(nm, sep))
DF <- read.table(file, skip = 1, sep = sep, ...)
if(length(names(DF)) > length(nm)){
names(DF)[(length(names(DF)) - length(nm) + 1):length(names(DF))] <- nm
} else names(DF) <- nm
DF
}
myread("test.csv", sep = ";")
# V1 Col1 Col2
#1 A 1 6
#2 A 2 7
#3 A 3 8
#4 A 4 9
#5 A 5 10
FILE
Here are the contents of the file. Note that the column separator is a semi-colon, in most continental Europe, we use the comma as a decimals marker and so the CSV format separates the columns with a semi-colon.
Col1;Col2
A;1;6
A;2;7
A;3;8
A;4;9
A;5;10
Upvotes: 2
Reputation: 3038
From the help page for read.csv
:
If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if ‘row.names’ is missing, the rows are numbered.
Without seeing the CSV it is difficult to tell, but it does seem that it must fulfil the criteria listed (ie, there is one fewer entry in the first row, possibly due to an empty column name).
Upvotes: 0