Natasha
Natasha

Reputation: 1521

Renaming the columns of a data frame with the same name

I am renaming the columns of a data frame(Data), in R, with the names stored in a character array.

If two names are same in the character array(Names), e.g ("JK","JK","test","hi")

using,

colnames(Data) <- Names
colnames(Data)

Output:

"JK" "JK.1" "test" "hi"

Desired output:

"JK" "JK" "test" "hi"

I am not able to figure out why .1 is appended to the second name.

Any suggestions on how to avoid this?

Upvotes: 1

Views: 3636

Answers (2)

akrun
akrun

Reputation: 887108

The reason why column names are changed is based on the make.unique call in data.frame which changes the duplicate column names

make.unique(c("JK", "JK", "JK", "test"))
#[1] "JK"   "JK.1" "JK.2" "test"

We can use sub to match the . (. is a metacharacter implies any character - so escape \\ it to get the literal meaning) followed by one or more digits (\\d+) to the end ($) of the string and replace it with blank ("")

names(Data) <- sub("\\.\\d+$", "", names(Data))
names(Data)
#[1] "JK"   "JK"   "test" "hi"  

Or another option is str_remove

library(stringr)
names(Data) <- str_remove(names(Data), "\\.\\d+#$")

NOTE: It is better to have unique column names in a data instead of duplicated names

Upvotes: 2

someone
someone

Reputation: 149

I am not able to figure out why .1 is appended to the second name.

This is because colnames of a dataframe must be unique. How will you be able to select a column if two columns have the same name? In order to avoid .1 being appended to the colname, make sure your names array has all unique elements. You can write a function check for duplicates in names array and replace with something logical.

Upvotes: 2

Related Questions