Edi
Edi

Reputation: 31

Convert written text in columns to numbers in R

Newbie-sh with R.

CHALLENGE: I have this data frame with a number of variables in columns (see below). I need to convert the text of "$ TIMEPT : chr" to a numeric value and do some math.

$ SUBJ  : chr  "1" "2" "3" "4" ...
$ VISIT   : chr  "0" "12" "34" "84" ...
$ TIMEPT  : chr "Within 15 minutes prior to stopping infusion" "Within 5 minutes prior to stopping infusion" "5 minutes post infusion" "15 minutes post infusion" ...

MY 2 APPROACHES:

1.

df$TIMEPT <-replace (df$TIMEPT, df$TIMEPT == "Within 15 minutes
prior to dosing", 0)

This approach only worked for the first set of text I tried converting the TIMEPT variable as a factor:

2.

df$TIMEPT <- within(df, TIMEPT <- df$TIMEPT <- factor(TIMEPT, labels
= c(0, 1,2,3.92,4.08, 4.25, 4.5, 5, 6, 7, 10)))

This approach (2) created nested tables of all variables (then the df became larger and more complex). Converting these factors to numbers did not work using the following expression:

df$TIMEPT <- as.numeric(as.numeric(df$TIMEPT))   

QUESTION - How could I convert such "long" text TIMEPT into numerical values?

EXPECTED OUTCOME

AN over simplification would be:

SUBJID VISIT TIMEPT
1 1 0 0
2 2 0 1

3 3 0 2

4 4 0 3
...

NOTE: The text in $TIMEPT have similar numerical values across the df. For example the text specifies "within 5 min prior", "before 5 min", "5 min post" ... As such, numerical parsing might not work (I'll try as suggested below)

Upvotes: 2

Views: 742

Answers (1)

Michael Vine
Michael Vine

Reputation: 335

try this

df$newvariable<-readr::parse_number(df$TIMET)

it should extract only the numbers from the character string.

example:

c<-data.frame(x=c(1,2,3,4,5,6,7,8,9), y=c("10 mins", "20 mins", "30 mins", "40 mins", "50 mins", "60 Minutes", "70 mins", "80 mins", "90 minutes"))
c$y<-as.character(c$y)
c$t<-readr::parse_number(c$y)


c
  x          y  t
1 1    10 mins 10
2 2    20 mins 20
3 3    30 mins 30
4 4    40 mins 40
5 5    50 mins 50
6 6 60 Minutes 60
7 7    70 mins 70
8 8    80 mins 80
9 9 90 minutes 90

Upvotes: 1

Related Questions