Alvaro Morales
Alvaro Morales

Reputation: 1925

How to deal with multiple types of data in one column in R?

I have several columns with various types of data in them. For instance, I have some double values like 1.4, 5.6, etc..and I have values below limit detection like < 0.01, < 0.0004, etc. In the Import text Data the columns are detected as character because of that. How can I deal with this?

With the solution of this I expect to do stats with all the values, taking into account the below limit detection values.

Upvotes: 0

Views: 1152

Answers (3)

Axel
Axel

Reputation: 47

If you want to extract the numbers you could use gsub("[^0-9.]+","",YourList) This version should keep the decimals. I tested with various formats before posting, but you might wanna have a look at your results before going further in your code.

> test <- c(1:4,"+65","<5","6>","46-6",6.5,"azer95.5")
> gsub("[^0-9.]+","",test)
[1] "1"    "2"    "3"    "4"    "65"   "5"    "6"    "466"  "6.5"  "95.5"

Upvotes: 0

akrun
akrun

Reputation: 886998

We can do this with tidyverse, Remove the < and then retype the columns

library(tidyverse)
library(hablar)
dfN <- df1 %>%
         mutate_if(is.character, list(~ str_remove(., "<")) %>% 
        retype

Upvotes: 1

Cettt
Cettt

Reputation: 11981

it depends on how you want to handle your data.

  1. If you want to want to work with numeric values you have to first determine what to do with values like <0.01. Do you simply want to treat is as 0.01? If yes you can use sub to delete the < symbol: as.numeric(sub("<", "", mycol))
  2. If you want to work with categorical variables you can bin them together, i.e. define groups <0.01, <0.1 <1 etc. In R you can do so using the case_when function:

Upvotes: 0

Related Questions