Marcin
Marcin

Reputation: 8064

How to remove duplicated (by name) column in data.tables in R?

While reading a data set using fread, I've noticed that sometimes I'm getting duplicated column names, for example (fread doesn't have check.names argument)

> data.table( x = 1, x = 2)
   x x
1: 1 2

The question is: is there any way to remove 1 of 2 columns if they have the same name?

Upvotes: 12

Views: 9954

Answers (3)

Arun
Arun

Reputation: 118879

.SDcols approaches would return a copy of the columns you're selecting. Instead just remove those duplicated columns using :=, by reference.

dt[, which(duplicated(names(dt))) := NULL]
#    x
# 1: 1

Upvotes: 16

Ben Bolker
Ben Bolker

Reputation: 226801

How about

dt[, .SD, .SDcols = unique(names(dt))]

This selects the first occurrence of each name (I'm not sure how you want to handle this).

As @DavidArenburg suggests in comments above, you could use check.names=TRUE in data.table() or fread()

Upvotes: 21

Dominic Comtois
Dominic Comtois

Reputation: 10421

Different approaches:

  1. Indexing

    my.data.table <- my.data.table[ ,-2]

  2. Subsetting

    my.data.table <- subset(my.data.table, select = -2)

  3. Making unique names if 1. and 2. are not ideal (when having hundreds of columns, for instance)

    setnames(my.data.table, make.names(names = names(my.data.table), unique=TRUE))

  4. Optionnaly systematize deletion of variables which names meet some criterion (here, we'll get rid of all variables having a name ending with ".X" (X being a number, starting at 2 when using make.names)

    my.data.table <- subset(my.data.table, select = !grepl(pattern = "\\.\\d$", x = names(my.data.table)))

Upvotes: 4

Related Questions