godot
godot

Reputation: 131

How to create a function which removes a certain first character of column names

R has problems when reading .csv files with column names that begin with a number; it changes these names by putting an "X" as the first character.

I am trying to write a function which simply solves this problem (although: is this the easiest way?)

As an example file, I simply created two new (non-sensical) columns in iris:

iris$X12.0 <- iris$Sepal.Length
iris$X18.0 <- iris$Petal.Length


remv.X <- function(x){
  if(substr(colnames(x), 1, 1) == "X"){
    colnames(x) <- substr(colnames(x), 2, 100)
  } 
  else{
    colnames(x) <- substr(colnames(x), 1, 100)
  }
} 

remv.X(iris)

When printing, I get a warning, and nothing changes. What do I do wrong?

Upvotes: 0

Views: 54

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269481

check.names=FALSE

Use the read.table/read.csv argument check.names = FALSE to turn off column name mangling.

For example,

read.csv(text = "1x,2x\n10,20", check.names = FALSE)

giving:

  1x 2x
1 10 20

Removing X using sub

If for some reason you did have an unwanted X character at the beginning of some column names they could be removed like this. This only removes an X at the beginning of columns names for which the next character is a digit. If the next character is not a digit or if there is no next character then the column name is left unchanged.

names(iris) <- sub("^X(\\d.*)", "\\1", names(iris))

or as a function:

rmX <- function(data) setNames(data, sub("^X(\\d.*)", "\\1", names(data)))

# test
iris <- rmX(iris)

Problem with code in question

There are two problems with the code in the question.

  1. in if (condition) ... the condition is a vector but must be a scalar.

  2. the data frame is never returned.

Here it is fixed up. We have also factored out the LHS of the two legs of the if.

remv.X2 <- function(x) {
  for (i in seq_along(x)) {
     colnames(x)[i] <- if (substr(colnames(x)[i], 1, 1) == "X") {
        substr(colnames(x)[i], 2, 100)
     } else {
        substr(colnames(x)[i], 1, 100)
     }
  }
  x
} 

iris <- remv.X2(iris)

or maybe even:

remv.X3 <- function(x) {
  setNames(x, substr(colnames(x), (substr(colnames(x), 1, 1) == "X") + 1, 100))
} 

iris <- remv.X3(iris)

Upvotes: 2

Related Questions