Reputation: 83
I need to separate the "value" variable in the following dataset into three variables: estimate, low, high. Note that sometimes there are no confidence intervals, so I just have the value.
country gho year publishstate value
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1980 Published 4.9 [2.5-8.6]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1981 Published 5.1 [2.7-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1982 Published 5.2 [2.9-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1983 Published 5.4 [3.1-8.6]
I have tried this:
Data$estimate <- sub("\\[.*","",Data$value)
but it only works for creating the variable estimate. I was thinking of using strsplit but it does not do the trick neither...
could you help on that one?
Thank you very much,
N.
Upvotes: 2
Views: 116
Reputation: 173793
Here's another way to do it using only base R
lapply(strsplit(Data$value, "[^[:digit:].]"), function(x) as.numeric(x[x != ""]))
# [[1]]
# [1] 4.9 2.5 8.6
#
# [[2]]
# [1] 5.1 2.7 8.5
#
# [[3]]
# [1] 5.2 2.9 8.5
#
# [[4]]
# [1] 5.4 3.1 8.6
Upvotes: 0
Reputation: 45
Using tidyr:
separate(df, value, c("estimate", "low", "high"), sep = "\\s\\[|-|\\]")
Hope this helps.
Upvotes: 0
Reputation: 269481
Using the data shown in the Note in reproducible form, we can use separate
as shown. The fill="right"
argument causes lower
and upper
to be filled in with NAs if only one subfield is listed in value
.
library(dplyr)
library(tidyr)
DF %>%
separate(value, c("value", "lower", "upper", NA), sep = "[^0-9.]+", fill = "right")
Lines <- "country,glucose,year,publishstate,value
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1980,Published,4.9 [2.5-8.6]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1981,Published,5.1 [2.7-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1982,Published,5.2 [2.9-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1983,Published,5.4 [3.1-8.6]"
DF <- read.csv(text = Lines, header = TRUE, as.is = TRUE)
Upvotes: 5