Reputation: 1105
What is the most efficient way to convert multiple columns in a data frame from character to numeric format?
I have a dataframe called DF with all character variables.
I would like to do something like
for (i in names(DF){
DF$i <- as.numeric(DF$i)
}
Thank you
Upvotes: 68
Views: 313134
Reputation: 11
DF[,6:11] <- sapply(DF[,6:11], as.numeric)
or
DF[,6:11] <- sapply(DF[,6:11], as.character)
Upvotes: 1
Reputation: 1
Since we can index a data frame column by it's name, a simple change can be made:
for (i in names(DF)){ DF[i] <- as.data.frame(as.numeric(as.matrix(DF[i]))) }
Upvotes: -1
Reputation: 410
Use data.table set
function
setDT(DF)
for (j in YourColumns)
set(DF, j=j, value = as.numeric(DF[[j]])
If you need to keep as data.frame
then just use setDF(DF)
Upvotes: 2
Reputation: 41
A<- read.csv("Environment_Temperature_change_E_All_Data_NOFLAG.csv",header = F)
Now, convert to character
A<- type.convert(A,as.is=T)
Convert some columns to numeric from character
A[,c(1,3,5,c(8:66))]<- as.numeric(as.character(unlist(A[,c(1,3,5,c(8:66))])))
Upvotes: -2
Reputation: 405
Using the across() function from dplyr 1.0
df <- df %>% mutate(across(, ~as.numeric(.))
Upvotes: 20
Reputation: 1
for (i in 1:names(DF){
DF[[i]] <- as.numeric(DF[[i]])
}
I solved this using double brackets [[]]
Upvotes: 0
Reputation: 291
I used this code to convert all columns to numeric except the first one:
library(dplyr)
# check structure, row and column number with: glimpse(df)
# convert to numeric e.g. from 2nd column to 10th column
df <- df %>%
mutate_at(c(2:10), as.numeric)
Upvotes: 29
Reputation: 337
type.convert()
Convert a data object to logical, integer, numeric, complex, character or factor as appropriate.
Add the as.is argument type.convert(df,as.is = T)
to prevent character vectors from becoming factors when there is a non-numeric in the data set.
Upvotes: 7
Reputation: 2907
If you're already using the tidyverse, there are a few solution depending on the exact situation.
Basic if you know it's all numbers and doesn't have NAs
library(dplyr)
# solution
dataset %>% mutate_if(is.character,as.numeric)
Test cases
df <- data.frame(
x1 = c('1','2','3'),
x2 = c('4','5','6'),
x3 = c('1','a','x'), # vector with alpha characters
x4 = c('1',NA,'6'), # numeric and NA
x5 = c('1',NA,'x'), # alpha and NA
stringsAsFactors = F)
# display starting structure
df %>% str()
Convert all character vectors to numeric (could fail if not numeric)
df %>%
select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
mutate_if(is.character,as.numeric) %>%
str()
Check if each column can be converted. This can be an anonymous function. It returns FALSE
if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.
is_all_numeric <- function(x) {
!any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>%
mutate_if(is_all_numeric,as.numeric) %>%
str()
If you want to convert specific named columns, then mutate_at is better.
df %>% mutate_at('x1', as.numeric) %>% str()
Upvotes: 70
Reputation: 71
Slight adjustment to answers from ARobertson and Kenneth Wilson that worked for me.
Running R 3.6.0, with library(tidyverse) and library(dplyr) in my environment:
library(tidyverse)
library(dplyr)
> df %<>% mutate_if(is.character, as.numeric)
Error in df %<>% mutate_if(is.character, as.numeric) :
could not find function "%<>%"
I did some quick research and found this note in Hadley's "The tidyverse style guide".
The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.
# Good x <- x %>% abs() %>% sort() # Bad x %<>% abs() %>% sort()
Solution
Based on that style guide:
df_clean <- df %>% mutate_if(is.character, as.numeric)
Working example
> df_clean <- df %>% mutate_if(is.character, as.numeric)
Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
3: NAs introduced by coercion
4: NAs introduced by coercion
5: NAs introduced by coercion
6: NAs introduced by coercion
7: NAs introduced by coercion
8: NAs introduced by coercion
9: NAs introduced by coercion
10: NAs introduced by coercion
> df_clean
# A tibble: 3,599 x 17
stack datetime volume BQT90 DBT90 DRT90 DLT90 FBT90 RT90 HTML90 RFT90 RLPP90 RAT90 SRVR90 SSL90 TCP90 group
<dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
Upvotes: 6
Reputation: 122
like this?
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = paste(10:15),
stringsAsFactors = FALSE)
DF <- apply(DF, 2, as.numeric)
If there are "real" characters in dataframe like 'a' 'b' 'c', i would recommend answer from davsjob.
Upvotes: 2
Reputation: 1960
You could use convert from the hablar package:
library(dplyr)
library(hablar)
# Sample df (stolen from the solution by Luca Braglia)
df <- tibble("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6])
# insert variable names in num()
df %>% convert(num(a, b))
Which gives you:
# A tibble: 6 x 3
a b c
<dbl> <dbl> <chr>
1 0. 0.100 a
2 1. 1.10 b
3 2. 2.10 c
4 3. 3.10 d
5 4. 4.10 e
6 5. 5.10 f
Or if you are lazy, let retype() from hablar guess the right data type:
df %>% retype()
which gives you:
# A tibble: 6 x 3
a b c
<int> <dbl> <chr>
1 0 0.100 a
2 1 1.10 b
3 2 2.10 c
4 3 3.10 d
5 4 4.10 e
6 5 5.10 f
Upvotes: 8
Reputation: 291
You can use index of columns:
data_set[,1:9] <- sapply(dataset[,1:9],as.character)
Upvotes: 29
Reputation: 393
I realize this is an old thread but wanted to post a solution similar to your request for a function (just ran into the similar issue myself trying to format an entire table to percentage labels).
Assume you have a df with 5 character columns you want to convert. First, I create a table containing the names of the columns I want to manipulate:
col_to_convert <- data.frame(nrow = 1:5
,col = c("col1","col2","col3","col4","col5"))
for (i in 1:max(cal_to_convert$row))
{
colname <- col_to_convert$col[i]
colnum <- which(colnames(df) == colname)
for (j in 1:nrow(df))
{
df[j,colnum] <- as.numericdf(df[j,colnum])
}
}
This is not ideal for large tables as it goes cell by cell, but it would get the job done.
Upvotes: 3
Reputation: 3243
You could try
DF <- data.frame("a" = as.character(0:5),
"b" = paste(0:5, ".1", sep = ""),
"c" = letters[1:6],
stringsAsFactors = FALSE)
# Check columns classes
sapply(DF, class)
# a b c
# "character" "character" "character"
cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)
# a b c
# "numeric" "numeric" "character"
Upvotes: 117
Reputation: 1105
I think I figured it out. Here's what I did (perhaps not the most elegant solution - suggestions on how to imp[rove this are very much welcome)
#names of columns in data frame
cols <- names(DF)
# character variables
cols.char <- c("fx_code","date")
#numeric variables
cols.num <- cols[!cols %in% cols.char]
DF.char <- DF[cols.char]
DF.num <- as.data.frame(lapply(DF[cols.num],as.numeric))
DF2 <- cbind(DF.char, DF.num)
Upvotes: 4