outlier123
outlier123

Reputation: 227

Remove quotes from Factor variables in R

I have over 500 factor columns in my dataframe many of which are only "True"/"False". Is there any way to remove quotes for just these columns in one shot?

Example code --

sample=as.list(dataframe[1,])
for(i in 1:length(sample)){
 if(sample[i]=="false") sample[i]=false
}

The above code doesn't seem to work. Any leads appreciated!

Upvotes: 0

Views: 257

Answers (3)

asachet
asachet

Reputation: 6921

This solves your problem:

> as.logical(c("true", "false", "True", "TRUE", "False"))
[1]  TRUE FALSE  TRUE  TRUE FALSE

I was surprised too.

EDIT: I just noticed your code and I figured you could use a complete example.

Your data is in a data.frame (which is basically a list of columns). This is similar to a spreadsheet if you will.

Doing dataframe[1,] extracts the first line of your dataset. I guess what you want is rather to get the first column with dataframe[,1]. This column is a vector, which is good to operate on, no need to put it in a list.

So you would do:

as.logical(dataframe[,1])

But that would only return the data you want, not modify the dataframe! So you want to assign this result to the first column:

dataframe[,1] <- as.logical(dataframe[,1])

There you go, the first column no longer contains strings but logicals, no matter what the capitalization was.

If by any chance you actually meant to work on the row, this is unusual and likely means that you should transpose your data.frame, i.e swap rows and columns. This is done with t.

Upvotes: 0

pcantalupo
pcantalupo

Reputation: 2226

I think this is what you want assuming that the columns you are talking about have two levels - "FALSE" and "TRUE".

df = data.frame(a=c("\"true\"","\"false\""), b=c("\"FALSE\"","\"TRUE\""), c=c("TRUE","FALSE"))
df
#        a       b     c
# 1  "true" "FALSE"  TRUE
# 2 "false"  "TRUE" FALSE

ftlev = c("\"FALSE\"", "\"TRUE\"")
df2 = lapply(df, FUN = function(x) {
  if (identical(ftlev,toupper(levels(x)))) {
    x = gsub('"','',x)
  }
  return(x)
})
as.data.frame(df2)

Output:

      a     b     c
1  true FALSE  TRUE
2 false  TRUE FALSE

The as.logical() function has been proposed in other answers/comments but it does not produce the expected output:

df2 = lapply(df, FUN = function(x) {
  if (identical(ftlev,toupper(levels(x)))) {
    x = as.logical(x)
  }
  return(x)
})
as.data.frame(df2)

Output:

  a  b     c
1 NA NA  TRUE
2 NA NA FALSE

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 146020

If you give a better example (with some columns to convert, some columns not to convert), I'm happy to test. From your description, I think this will work:

data = lapply(data, FUN = function(x) {
    if (is.factor(x) & all(toupper(levels(x)) %in% c("TRUE", "FALSE"))) {
        return(as.logical(x))
    }
    return(x)
})

It tests if the column is a factor and if its levels can be coerced to TRUE and FALSE, converts it to logical if yes, returns the column unchanged if no.

Upvotes: 2

Related Questions