ebb
ebb

Reputation: 274

R - error: "duplicate 'row.names' are not allowed"

I know this is a recurrent question, so I apologize in advance for cross-posting.

I am having trouble reading a model output .csv file that looks more or less like this (the original file has about 14,000 rows, but the columns are as they look here):

time    x     y       z          w      r         s         t
1980    1   0.8327  0.3402    0.2021    0       1.1729      0
1980    2   0.7886  0.3399    0.2019    0       2.3014      0
1980    3   0.7909  0.3396    0.2017    0       3.4319      0
1980    4   0.7846  0.3394    0.2016    0       4.5559      0
1980    5   0.8103  0.3392    0.2014    0       5.7053      0
1980    6   0.8207  0.339     0.2013    0       6.865       0
1980    7   0.8263  0.3388    0.2012    0       8.0301      0
1980    14  0.9112  10.3411   20.6821   3.1175  60.4644     3.1175
1980    15  0.9092  8.878     17.756    2.734   70.2517     5.8515
1980    16  0.9001  9.5232    19.0464   2.9655  80.6749     8.817
1980    17  1.0313  7.59      15.18     2.4332  89.2962     11.2502
1980    18  1.0333  6.8859    13.7718   2.266   97.2154     13.5162

For the command:

read.csv("df", header = TRUE, sep = ",", blank.lines.skip = FALSE)

I get the following error message:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  duplicate 'row.names' are not allowed

From what I've understood from answers to similar questions, a possible problem may be that the read.csv command is not recognizing the zeros in the last column as values, so the program reads it as if the first row contained one fewer field than the number of columns, and hence uses the first column for the row names.

However, when I create a "fake" table with actual zeros, blanks, or "NA" in the same positions as shown in the example above, the program has no trouble recognizing them and reading the file.

e.g.

df <- data.frame(x=c(1,2,3,3,3,4,5,2,2,6,7,3,8,9,10))
df$y <- c(4,8,9,1,1,5,8,8,3,2,0,9,4,4,7)
df$z <- c(" "," "," ",4,5,6,7,8,9,10,11,12,13,14,15)
OR:
df$z <- c(0,0,0,4,5,6,7,8,9,10,11,12,13,14,15)
OR:
df$z <- c("NA","NA","NA",4,5,6,7,8,9,10,11,12,13,14,15)

Could anyone tell me why is this happening?

I have solved the issue as suggested by other users:

df <- read.csv("df.csv", header = TRUE, row.names = NULL)
colnames(df) <- c(colnames(df)[-1],NULL)
write.table(df, "df.csv", sep = ",", col.names = TRUE, row.names = FALSE)

And start working as normal from here.

It works just fine, but I was wondering if there is a more direct solution to this problem, or if there is something I am missing.

Thank you,

Upvotes: 1

Views: 21950

Answers (3)

Barry vT
Barry vT

Reputation: 109

my 'row.names' problem was that i was simply using the wrong 'sep'

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76402

Here are two ways.

The first uses an external package, data.table. Function fread does the job with a warning. And the column names are messed up, since the first row has less fields than the other rows, fread discards that row.

data.table::fread("test.csv", sep = ";")
#   V1 V2 V3
#1:  A  1  6
#2:  A  2  7
#3:  A  3  8
#4:  A  4  9
#5:  A  5 10

Warning message:
In data.table::fread("test2.csv", sep = ";") :
Starting data input on line 2 and discarding line 1 because it has too
few or too many items to be column names or data: Col1;Col2

The second way is more complicated. If you don't want to load an extra package, I have written a function that uses readLines to read in the first row with the column names and then reads the rest of the file with read.table.

myread <- function(file, sep = ",", ...){
    nm <- readLines(file, n = 1)
    nm <- unlist(strsplit(nm, sep))
    DF <- read.table(file, skip = 1, sep = sep, ...)
    if(length(names(DF)) > length(nm)){
        names(DF)[(length(names(DF)) - length(nm) + 1):length(names(DF))] <- nm
    } else names(DF) <- nm
    DF
}

myread("test.csv", sep = ";")
#  V1 Col1 Col2
#1  A    1    6
#2  A    2    7
#3  A    3    8
#4  A    4    9
#5  A    5   10

FILE

Here are the contents of the file. Note that the column separator is a semi-colon, in most continental Europe, we use the comma as a decimals marker and so the CSV format separates the columns with a semi-colon.

Col1;Col2
A;1;6
A;2;7
A;3;8
A;4;9
A;5;10

Upvotes: 2

alan ocallaghan
alan ocallaghan

Reputation: 3038

From the help page for read.csv:

If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if ‘row.names’ is missing, the rows are numbered.

Without seeing the CSV it is difficult to tell, but it does seem that it must fulfil the criteria listed (ie, there is one fewer entry in the first row, possibly due to an empty column name).

Upvotes: 0

Related Questions