Reputation: 31
Newbie-sh with R.
CHALLENGE: I have this data frame with a number of variables in columns (see below). I need to convert the text of "$ TIMEPT : chr" to a numeric value and do some math.
$ SUBJ : chr "1" "2" "3" "4" ...
$ VISIT : chr "0" "12" "34" "84" ...
$ TIMEPT : chr "Within 15 minutes prior to stopping infusion" "Within 5 minutes prior to stopping infusion" "5 minutes post infusion" "15 minutes post infusion" ...
MY 2 APPROACHES:
1.
df$TIMEPT <-replace (df$TIMEPT, df$TIMEPT == "Within 15 minutes
prior to dosing", 0)
This approach only worked for the first set of text I tried converting the TIMEPT variable as a factor:
2.
df$TIMEPT <- within(df, TIMEPT <- df$TIMEPT <- factor(TIMEPT, labels
= c(0, 1,2,3.92,4.08, 4.25, 4.5, 5, 6, 7, 10)))
This approach (2) created nested tables of all variables (then the df became larger and more complex). Converting these factors to numbers did not work using the following expression:
df$TIMEPT <- as.numeric(as.numeric(df$TIMEPT))
QUESTION - How could I convert such "long" text TIMEPT into numerical values?
EXPECTED OUTCOME
AN over simplification would be:
SUBJID VISIT TIMEPT
1 1 0 0
2 2 0 1
3 3 0 2
4 4 0 3
...
NOTE: The text in $TIMEPT have similar numerical values across the df. For example the text specifies "within 5 min prior", "before 5 min", "5 min post" ... As such, numerical parsing might not work (I'll try as suggested below)
Upvotes: 2
Views: 742
Reputation: 335
try this
df$newvariable<-readr::parse_number(df$TIMET)
it should extract only the numbers from the character string.
example:
c<-data.frame(x=c(1,2,3,4,5,6,7,8,9), y=c("10 mins", "20 mins", "30 mins", "40 mins", "50 mins", "60 Minutes", "70 mins", "80 mins", "90 minutes"))
c$y<-as.character(c$y)
c$t<-readr::parse_number(c$y)
c
x y t
1 1 10 mins 10
2 2 20 mins 20
3 3 30 mins 30
4 4 40 mins 40
5 5 50 mins 50
6 6 60 Minutes 60
7 7 70 mins 70
8 8 80 mins 80
9 9 90 minutes 90
Upvotes: 1