wanglin
wanglin

Reputation: 121

An Elegant way to change columns type in dataframe in R

I have a data.frame which contains columns of different types, such as integer, character, numeric, and factor.

I need to convert the integer columns to numeric for use in the next step of analysis.

Example: test.data includes 4 columns (though there are thousands in my real data set): age, gender, work.years, and name; age and work.years are integer, gender is factor, and name is character. What I need to do is change age and work.years into a numeric type. And I wrote one piece of code to do this.

test.data[sapply(test.data, is.integer)] <-lapply(test.data[sapply(test.data, is.integer)], as.numeric)

It looks not good enough though it works. So I am wondering if there is some more elegant methods to fulfill this function. Any creative method will be appreciated.

Upvotes: 10

Views: 75071

Answers (3)

DrBroo
DrBroo

Reputation: 211

Now very elegant in dplyr (with magrittr %<>% operator)

test.data %<>% mutate_if(is.integer,as.numeric)

Upvotes: 21

akrun
akrun

Reputation: 887118

I think elegant code is sometimes subjective. For me, this is elegant but it may be less efficient compared to the OP's code. However, as the question is about elegant code, this can be used.

test.data[] <- lapply(test.data, function(x) if(is.integer(x)) as.numeric(x) else x)

Also, another elegant option is dplyr

library(dplyr)
library(magrittr)
test.data %<>% 
      mutate_each(funs(if(is.integer(.)) as.numeric(.) else .))

Upvotes: 27

bgoldst
bgoldst

Reputation: 35314

It's tasks like this that I think are best accomplished with explicit loops. You don't buy anything here by replacing a straightforward for-loop with the hidden loop of a function like lapply(). Example:

## generate data
set.seed(1L);
N <- 3L; test.data <- data.frame(age=sample(20:90,N,T),gender=factor(sample(c('M','F'),N,T)),work.years=sample(1:5,N,T),name=sample(letters,N,T),stringsAsFactors=F);
test.data;
##   age gender work.years name
## 1  38      F          5    b
## 2  46      M          4    f
## 3  60      F          4    e
str(test.data);
## 'data.frame':   3 obs. of  4 variables:
##  $ age       : int  38 46 60
##  $ gender    : Factor w/ 2 levels "F","M": 1 2 1
##  $ work.years: int  5 4 4
##  $ name      : chr  "b" "f" "e"

## solution
for (cn in names(test.data)[sapply(test.data,is.integer)])
    test.data[[cn]] <- as.double(test.data[[cn]]);

## result
test.data;
##   age gender work.years name
## 1  38      F          5    b
## 2  46      M          4    f
## 3  60      F          4    e
str(test.data);
## 'data.frame':   3 obs. of  4 variables:
##  $ age       : num  38 46 60
##  $ gender    : Factor w/ 2 levels "F","M": 1 2 1
##  $ work.years: num  5 4 4
##  $ name      : chr  "b" "f" "e"

Upvotes: 2

Related Questions