ec0n0micus
ec0n0micus

Reputation: 1105

converting multiple columns from character to numeric format in r

What is the most efficient way to convert multiple columns in a data frame from character to numeric format?

I have a dataframe called DF with all character variables.

I would like to do something like

for (i in names(DF){
    DF$i <- as.numeric(DF$i)
}

Thank you

Upvotes: 68

Views: 313134

Answers (16)

Zineb EL GHAZZAL
Zineb EL GHAZZAL

Reputation: 11

DF[,6:11] <- sapply(DF[,6:11], as.numeric)

or

DF[,6:11] <- sapply(DF[,6:11], as.character)

Upvotes: 1

Nick Ooster
Nick Ooster

Reputation: 1

Since we can index a data frame column by it's name, a simple change can be made:

for (i in names(DF)){ DF[i] <- as.data.frame(as.numeric(as.matrix(DF[i]))) }

Upvotes: -1

yuskam
yuskam

Reputation: 410

Use data.table set function

setDT(DF)
for (j in YourColumns)
     set(DF, j=j, value = as.numeric(DF[[j]])

If you need to keep as data.frame then just use setDF(DF)

Upvotes: 2

Ankit Kumar Singh
Ankit Kumar Singh

Reputation: 41

A<- read.csv("Environment_Temperature_change_E_All_Data_NOFLAG.csv",header = F)

Now, convert to character

A<- type.convert(A,as.is=T)

Convert some columns to numeric from character

A[,c(1,3,5,c(8:66))]<- as.numeric(as.character(unlist(A[,c(1,3,5,c(8:66))])))

Upvotes: -2

etrowbridge
etrowbridge

Reputation: 405

Using the across() function from dplyr 1.0

   df <- df %>% mutate(across(, ~as.numeric(.))

Upvotes: 20

M_lix
M_lix

Reputation: 1

for (i in 1:names(DF){
    DF[[i]] <- as.numeric(DF[[i]])
}

I solved this using double brackets [[]]

Upvotes: 0

YodaM
YodaM

Reputation: 291

I used this code to convert all columns to numeric except the first one:

    library(dplyr)
    # check structure, row and column number with: glimpse(df)
    # convert to numeric e.g. from 2nd column to 10th column
    df <- df %>% 
     mutate_at(c(2:10), as.numeric)

Upvotes: 29

Zuooo
Zuooo

Reputation: 337

type.convert()

Convert a data object to logical, integer, numeric, complex, character or factor as appropriate.

Add the as.is argument type.convert(df,as.is = T) to prevent character vectors from becoming factors when there is a non-numeric in the data set.

See.

Upvotes: 7

ARobertson
ARobertson

Reputation: 2907

If you're already using the tidyverse, there are a few solution depending on the exact situation.

Basic if you know it's all numbers and doesn't have NAs

library(dplyr)

# solution
dataset %>% mutate_if(is.character,as.numeric)

Test cases

df <- data.frame(
  x1 = c('1','2','3'),
  x2 = c('4','5','6'),
  x3 = c('1','a','x'), # vector with alpha characters
  x4 = c('1',NA,'6'), # numeric and NA
  x5 = c('1',NA,'x'), # alpha and NA
  stringsAsFactors = F)

# display starting structure
df %>% str()

Convert all character vectors to numeric (could fail if not numeric)

df %>%
  select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
  mutate_if(is.character,as.numeric) %>%
  str()

Check if each column can be converted. This can be an anonymous function. It returns FALSE if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.

is_all_numeric <- function(x) {
  !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>% 
  mutate_if(is_all_numeric,as.numeric) %>%
  str()

If you want to convert specific named columns, then mutate_at is better.

df %>% mutate_at('x1', as.numeric) %>% str()

Upvotes: 70

PvR
PvR

Reputation: 71

Slight adjustment to answers from ARobertson and Kenneth Wilson that worked for me.

Running R 3.6.0, with library(tidyverse) and library(dplyr) in my environment:

library(tidyverse)
library(dplyr)
> df %<>% mutate_if(is.character, as.numeric)
Error in df %<>% mutate_if(is.character, as.numeric) : 
  could not find function "%<>%"

I did some quick research and found this note in Hadley's "The tidyverse style guide".

The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.

# Good x <- x %>%
           abs() %>%    
           sort()

# Bad x %<>%   
          abs() %>%
          sort()

Solution

Based on that style guide:

df_clean <- df %>% mutate_if(is.character, as.numeric)

Working example

> df_clean <- df %>% mutate_if(is.character, as.numeric)
Warning messages:
1: NAs introduced by coercion 
2: NAs introduced by coercion 
3: NAs introduced by coercion 
4: NAs introduced by coercion 
5: NAs introduced by coercion 
6: NAs introduced by coercion 
7: NAs introduced by coercion 
8: NAs introduced by coercion 
9: NAs introduced by coercion 
10: NAs introduced by coercion 
> df_clean
# A tibble: 3,599 x 17
   stack datetime            volume BQT90 DBT90 DRT90 DLT90 FBT90  RT90 HTML90 RFT90 RLPP90 RAT90 SRVR90 SSL90 TCP90 group
   <dbl> <dttm>               <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>

Upvotes: 6

tickly potato
tickly potato

Reputation: 122

like this?

DF <- data.frame("a" = as.character(0:5),
             "b" = paste(0:5, ".1", sep = ""),
             "c" = paste(10:15),
             stringsAsFactors = FALSE)

DF <- apply(DF, 2, as.numeric)

If there are "real" characters in dataframe like 'a' 'b' 'c', i would recommend answer from davsjob.

Upvotes: 2

davsjob
davsjob

Reputation: 1960

You could use convert from the hablar package:

library(dplyr)
library(hablar)

# Sample df (stolen from the solution by Luca Braglia)
df <- tibble("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6])

# insert variable names in num()
df %>% convert(num(a, b))

Which gives you:

# A tibble: 6 x 3
      a     b c    
  <dbl> <dbl> <chr>
1    0. 0.100 a    
2    1. 1.10  b    
3    2. 2.10  c    
4    3. 3.10  d    
5    4. 4.10  e    
6    5. 5.10  f   

Or if you are lazy, let retype() from hablar guess the right data type:

df %>% retype()

which gives you:

# A tibble: 6 x 3
      a     b c    
  <int> <dbl> <chr>
1     0 0.100 a    
2     1 1.10  b    
3     2 2.10  c    
4     3 3.10  d    
5     4 4.10  e    
6     5 5.10  f   

Upvotes: 8

Masimi
Masimi

Reputation: 291

You can use index of columns: data_set[,1:9] <- sapply(dataset[,1:9],as.character)

Upvotes: 29

Mark Wagner
Mark Wagner

Reputation: 393

I realize this is an old thread but wanted to post a solution similar to your request for a function (just ran into the similar issue myself trying to format an entire table to percentage labels).

Assume you have a df with 5 character columns you want to convert. First, I create a table containing the names of the columns I want to manipulate:

col_to_convert <- data.frame(nrow = 1:5
                            ,col = c("col1","col2","col3","col4","col5"))

for (i in 1:max(cal_to_convert$row))
  {
    colname <- col_to_convert$col[i]
    colnum <- which(colnames(df) == colname)
        for (j in 1:nrow(df))
          {
           df[j,colnum] <- as.numericdf(df[j,colnum])
          }
  }

This is not ideal for large tables as it goes cell by cell, but it would get the job done.

Upvotes: 3

Luca Braglia
Luca Braglia

Reputation: 3243

You could try

DF <- data.frame("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6],
                 stringsAsFactors = FALSE)

# Check columns classes
sapply(DF, class)

#           a           b           c 
# "character" "character" "character" 

cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)

#          a           b           c 
#  "numeric"   "numeric" "character"

Upvotes: 117

ec0n0micus
ec0n0micus

Reputation: 1105

I think I figured it out. Here's what I did (perhaps not the most elegant solution - suggestions on how to imp[rove this are very much welcome)

#names of columns in data frame
cols <- names(DF)
# character variables
cols.char <- c("fx_code","date")
#numeric variables
cols.num <- cols[!cols %in% cols.char]

DF.char <- DF[cols.char]
DF.num <- as.data.frame(lapply(DF[cols.num],as.numeric))
DF2 <- cbind(DF.char, DF.num)

Upvotes: 4

Related Questions