Reputation: 1521
I am renaming the columns of a data frame(Data), in R, with the names stored in a character array.
If two names are same in the character array(Names), e.g ("JK","JK","test","hi")
using,
colnames(Data) <- Names
colnames(Data)
Output:
"JK" "JK.1" "test" "hi"
Desired output:
"JK" "JK" "test" "hi"
I am not able to figure out why .1 is appended to the second name.
Any suggestions on how to avoid this?
Upvotes: 1
Views: 3636
Reputation: 887108
The reason why column names are changed is based on the make.unique
call in data.frame
which changes the duplicate column names
make.unique(c("JK", "JK", "JK", "test"))
#[1] "JK" "JK.1" "JK.2" "test"
We can use sub
to match the .
(.
is a metacharacter implies any character - so escape \\
it to get the literal meaning) followed by one or more digits (\\d+
) to the end ($
) of the string and replace it with blank (""
)
names(Data) <- sub("\\.\\d+$", "", names(Data))
names(Data)
#[1] "JK" "JK" "test" "hi"
Or another option is str_remove
library(stringr)
names(Data) <- str_remove(names(Data), "\\.\\d+#$")
NOTE: It is better to have unique column names in a data instead of duplicated names
Upvotes: 2
Reputation: 149
I am not able to figure out why .1 is appended to the second name.
This is because colnames of a dataframe must be unique. How will you be able to select a column if two columns have the same name? In order to avoid .1 being appended to the colname, make sure your names array has all unique elements. You can write a function check for duplicates in names array and replace with something logical.
Upvotes: 2