Reputation: 131
R has problems when reading .csv files with column names that begin with a number; it changes these names by putting an "X" as the first character.
I am trying to write a function which simply solves this problem (although: is this the easiest way?)
As an example file, I simply created two new (non-sensical) columns in iris:
iris$X12.0 <- iris$Sepal.Length
iris$X18.0 <- iris$Petal.Length
remv.X <- function(x){
if(substr(colnames(x), 1, 1) == "X"){
colnames(x) <- substr(colnames(x), 2, 100)
}
else{
colnames(x) <- substr(colnames(x), 1, 100)
}
}
remv.X(iris)
When printing, I get a warning, and nothing changes. What do I do wrong?
Upvotes: 0
Views: 54
Reputation: 269481
check.names=FALSE
Use the read.table/read.csv argument check.names = FALSE
to turn off column name mangling.
For example,
read.csv(text = "1x,2x\n10,20", check.names = FALSE)
giving:
1x 2x
1 10 20
Removing X using sub
If for some reason you did have an unwanted X character at the beginning of some column names they could be removed like this. This only removes an X at the beginning of columns names for which the next character is a digit. If the next character is not a digit or if there is no next character then the column name is left unchanged.
names(iris) <- sub("^X(\\d.*)", "\\1", names(iris))
or as a function:
rmX <- function(data) setNames(data, sub("^X(\\d.*)", "\\1", names(data)))
# test
iris <- rmX(iris)
Problem with code in question
There are two problems with the code in the question.
in if (condition) ...
the condition is a vector but must be a
scalar.
the data frame is never returned.
Here it is fixed up. We have also factored out the LHS of the two legs of the if
.
remv.X2 <- function(x) {
for (i in seq_along(x)) {
colnames(x)[i] <- if (substr(colnames(x)[i], 1, 1) == "X") {
substr(colnames(x)[i], 2, 100)
} else {
substr(colnames(x)[i], 1, 100)
}
}
x
}
iris <- remv.X2(iris)
or maybe even:
remv.X3 <- function(x) {
setNames(x, substr(colnames(x), (substr(colnames(x), 1, 1) == "X") + 1, 100))
}
iris <- remv.X3(iris)
Upvotes: 2