Reputation: 474
I have a string
mat <- c("[('value-66 > 0.03', 0.1314460), ('0.03 < value-222 <= 0.06', -0.062805), ('0.01 < value-93 <= 0.03', -0.058007), ('value-141 > 0.05', -0.051339234), ('value-108 <= 0.01', -0.0373), ('value-303 > 0.02', 0.037257)]")
I want to divide each parentheses values into three columns.
For the first exemple the final matrix will contain three columns:
value-66, > 0.03, 0.1314460
My difficulty is with example like this one:
'0.01 < value-93 <= 0.03', -0.058007
I have not found a solution to put it into three columns like:
value-93, 0.01 < <= 0.03, -0.058007
I have tried this but it didn't cut correctly
s <- strsplit(mat, ",")
s1 <- lapply(s, function(x) trimws(x,which=c('both')))
s1 <- lapply(s1, function(x) strsplit(x,' '))
Do I have to set conditions in a loop?
Upvotes: 3
Views: 112
Reputation: 1618
You don't need a loop function.
Try this:
library(stringr)
mat <- c("[('value-66 > 0.03', 0.1314460), ('0.03 < value-222 <= 0.06', -0.062805), ('0.01 < value-93 <= 0.03', -0.058007), ('value-141 > 0.05', -0.051339234), ('value-108 <= 0.01', -0.0373), ('value-303 > 0.02', 0.037257)]")
mat %>%
str_extract_all("\\(.+?\\)") %>%
sapply(str_remove_all, "\\(|\\)|\\'") %>%
as.character() %>%
str_split(",") %>%
(
function(i){
c12 <- sapply(i, "[[", 1)
c1 <- str_extract(c12, "value[^ ]+")
c2 <- str_remove(c12, c1)
c3 <- sapply(i, "[[", 2)
cbind(c1, c2, c3)
}
)
c1 c2 c3
[1,] "value-66" " > 0.03" " 0.1314460"
[2,] "value-222" "0.03 < <= 0.06" " -0.062805"
[3,] "value-93" "0.01 < <= 0.03" " -0.058007"
[4,] "value-141" " > 0.05" " -0.051339234"
[5,] "value-108" " <= 0.01" " -0.0373"
[6,] "value-303" " > 0.02" " 0.037257"
stringr
is my favorite for doing string manipulation including regex
. It is consistent and the functions are easier to remember. However, you can use R base function to doing it, if you want.
Upvotes: 3
Reputation: 2688
You won't need loops, just some regex
. Here's how to approach this problem with just base r functions. I'd recommend looking into stringr
, but I think it's important to learn the base R version if you're starting out. I also broke each step down for clarity, but there are ways to combine this code into fewer steps.
Notice how the values are organized in sets of parentheses, so it's easiest to split the string up using that pattern.
# Remove brackets
s <- gsub("\\[|\\]", "", mat)
# Extract strings within parentheses
grx <- gregexpr("\\(.+?\\)", s)
rows <- do.call(c, regmatches(s, grx))
# Remove parentheses
rows <- gsub("\\(|\\)", "", rows)
# Remove quotes
rows <- gsub("\\'", "", rows)
# Split by comma
df <- as.data.frame(do.call(rbind, strsplit(rows, ",")), stringsAsFactors = F)
# Extract values
grx <- "(?<=value\\-)[0-9.]+"
vals <- gregexpr(grx, df$V1, perl = TRUE)
df$V3 <- paste0("value-", as.numeric(unlist(regmatches(df$V1, vals))))
df
V1 V2 V3
1 value-66 > 0.03 0.1314460 value-66
2 0.03 < value-222 <= 0.06 -0.062805 value-222
3 0.01 < value-93 <= 0.03 -0.058007 value-93
4 value-141 > 0.05 -0.051339234 value-141
5 value-108 <= 0.01 -0.0373 value-108
6 value-303 > 0.02 0.037257 value-303
I didn't do the last step of removing "value-XX" from the string, partly because I don't see why you would want a column like that. I'll let you tackle that one, try to use gsub
for that. You give the dataframe any column names you want.
Upvotes: 2