Noura
Noura

Reputation: 474

Split a string into columns in R

I have a string

mat <- c("[('value-66 > 0.03', 0.1314460), ('0.03 < value-222 <= 0.06', -0.062805), ('0.01 < value-93 <= 0.03', -0.058007), ('value-141 > 0.05', -0.051339234), ('value-108 <= 0.01', -0.0373), ('value-303 > 0.02', 0.037257)]") 

I want to divide each parentheses values into three columns.

For the first exemple the final matrix will contain three columns:

value-66, > 0.03, 0.1314460

My difficulty is with example like this one:

'0.01 < value-93 <= 0.03', -0.058007

I have not found a solution to put it into three columns like:

value-93, 0.01 <  <= 0.03, -0.058007

I have tried this but it didn't cut correctly

s <- strsplit(mat, ",")
s1 <- lapply(s, function(x) trimws(x,which=c('both')))
s1 <- lapply(s1, function(x) strsplit(x,' '))

Do I have to set conditions in a loop?

Upvotes: 3

Views: 112

Answers (2)

nurandi
nurandi

Reputation: 1618

You don't need a loop function.

Try this:

library(stringr)

mat <- c("[('value-66 > 0.03', 0.1314460), ('0.03 < value-222 <= 0.06', -0.062805), ('0.01 < value-93 <= 0.03', -0.058007), ('value-141 > 0.05', -0.051339234), ('value-108 <= 0.01', -0.0373), ('value-303 > 0.02', 0.037257)]") 

mat %>%
  str_extract_all("\\(.+?\\)") %>%
  sapply(str_remove_all, "\\(|\\)|\\'") %>%
  as.character() %>%
  str_split(",") %>%
  (
    function(i){
      c12 <- sapply(i, "[[", 1)
      c1 <- str_extract(c12, "value[^ ]+")
      c2 <- str_remove(c12, c1)
      c3 <- sapply(i, "[[", 2)
      cbind(c1, c2, c3)
    }
  )
     c1          c2                c3             
[1,] "value-66"  " > 0.03"         " 0.1314460"   
[2,] "value-222" "0.03 <  <= 0.06" " -0.062805"   
[3,] "value-93"  "0.01 <  <= 0.03" " -0.058007"   
[4,] "value-141" " > 0.05"         " -0.051339234"
[5,] "value-108" " <= 0.01"        " -0.0373"     
[6,] "value-303" " > 0.02"         " 0.037257"  

stringr is my favorite for doing string manipulation including regex. It is consistent and the functions are easier to remember. However, you can use R base function to doing it, if you want.

Upvotes: 3

astrofunkswag
astrofunkswag

Reputation: 2688

You won't need loops, just some regex. Here's how to approach this problem with just base r functions. I'd recommend looking into stringr, but I think it's important to learn the base R version if you're starting out. I also broke each step down for clarity, but there are ways to combine this code into fewer steps.

Notice how the values are organized in sets of parentheses, so it's easiest to split the string up using that pattern.

# Remove brackets
s <- gsub("\\[|\\]", "", mat)

# Extract strings within parentheses
grx <- gregexpr("\\(.+?\\)",  s)
rows <- do.call(c, regmatches(s, grx))

# Remove parentheses
rows <- gsub("\\(|\\)", "", rows)
# Remove quotes
rows <- gsub("\\'", "", rows)

# Split by comma
df <- as.data.frame(do.call(rbind, strsplit(rows, ",")), stringsAsFactors = F)  

# Extract values
grx <- "(?<=value\\-)[0-9.]+"
vals <- gregexpr(grx, df$V1, perl = TRUE)
df$V3 <- paste0("value-", as.numeric(unlist(regmatches(df$V1, vals))))


df
                        V1            V2        V3
1          value-66 > 0.03     0.1314460  value-66
2 0.03 < value-222 <= 0.06     -0.062805 value-222
3  0.01 < value-93 <= 0.03     -0.058007  value-93
4         value-141 > 0.05  -0.051339234 value-141
5        value-108 <= 0.01       -0.0373 value-108
6         value-303 > 0.02      0.037257 value-303

I didn't do the last step of removing "value-XX" from the string, partly because I don't see why you would want a column like that. I'll let you tackle that one, try to use gsub for that. You give the dataframe any column names you want.

Upvotes: 2

Related Questions