camcecc10
camcecc10

Reputation: 47

R way to select unique columns from column name?

my issue is I have a big database of 283 columns, some of which have the same name (for example, "uncultured").
Is there a way to select columns avoiding those with repeated names? Those (bacteria) normally have a very small abundance, so I don't really care for their contribution, I'd just like to take the columns with unique names.

My database is something like

    Samples  col1  col2  col3  col4 col2 col1.... 
S1
S2
S3
...

and I'd like to select every column but the second col2 and col1.

Thanks!

Upvotes: 1

Views: 568

Answers (2)

s__
s__

Reputation: 9485

Something like this should work:

df[, !duplicated(colnames(df))]

Upvotes: 4

Edo
Edo

Reputation: 7818

Like this you will automatically select the first column with a unique name:

df[unique(colnames(df))]
#>   col1 col2 col3 col4 S1 S2 S3
#> 1    1    2    3    4  7  8  9
#> 2    1    2    3    4  7  8  9
#> 3    1    2    3    4  7  8  9
#> 4    1    2    3    4  7  8  9
#> 5    1    2    3    4  7  8  9

Reproducible example

df is defined as:

df <- as.data.frame(matrix(rep(1:9, 5), ncol = 9, byrow = TRUE))
colnames(df) <- c("col1", "col2", "col3", "col4", "col2", "col1", "S1", "S2", "S3")
df
#>   col1 col2 col3 col4 col2 col1 S1 S2 S3
#> 1    1    2    3    4    5    6  7  8  9
#> 2    1    2    3    4    5    6  7  8  9
#> 3    1    2    3    4    5    6  7  8  9
#> 4    1    2    3    4    5    6  7  8  9
#> 5    1    2    3    4    5    6  7  8  9

Upvotes: 3

Related Questions