Reputation: 8064
While reading a data set using fread
, I've noticed that sometimes I'm getting duplicated column names, for example (fread
doesn't have check.names
argument)
> data.table( x = 1, x = 2)
x x
1: 1 2
The question is: is there any way to remove 1 of 2 columns if they have the same name?
Upvotes: 12
Views: 9954
Reputation: 118879
.SDcols
approaches would return a copy of the columns you're selecting. Instead just remove those duplicated columns using :=
, by reference.
dt[, which(duplicated(names(dt))) := NULL]
# x
# 1: 1
Upvotes: 16
Reputation: 226801
How about
dt[, .SD, .SDcols = unique(names(dt))]
This selects the first occurrence of each name (I'm not sure how you want to handle this).
As @DavidArenburg suggests in comments above, you could use check.names=TRUE
in data.table()
or fread()
Upvotes: 21
Reputation: 10421
Different approaches:
Indexing
my.data.table <- my.data.table[ ,-2]
Subsetting
my.data.table <- subset(my.data.table, select = -2)
Making unique names if 1. and 2. are not ideal (when having hundreds of columns, for instance)
setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))
Optionnaly systematize deletion of variables which names meet some criterion (here, we'll get rid of all variables having a name ending with ".X" (X being a number, starting at 2 when using make.names
)
my.data.table <- subset(my.data.table,
select = !grepl(pattern = "\\.\\d$", x = names(my.data.table)))
Upvotes: 4