biviz
biviz

Reputation: 173

Splitting string into multiple columns

I have the following string structure as a column value in my data frame :

Y: 10 ,W: 3 , cp: 0.05

the numeric values at each row differ but the structure remains the same. I want to split this string into 3 columns, each containing only the numbers. So there will be one column for Y with the corresponding numeric value, another for W and the last for cp.

I have tried using str_split in the following way:

str_split(string,pattern = " ,",simplify = TRUE )

which obviously gives me:

     [,1]     [,2]   [,3]       
[1,] "Y: 40 " "W: 2" " cp: 0.05"

Now, I want to keep only the numbers in each of those columns. Still learning this stuff so not sure how to proceed! Any help is highly appreciated!

Upvotes: 1

Views: 764

Answers (6)

Don Span
Don Span

Reputation: 3

I believe this should work:

library(tidyverse)


string <- c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")


dat <- tibble(x = string) %>% 
  separate(x,c("Y","W","cp"), sep = " ,")


dat2 <- dat %>% mutate_all(., ~str_remove(.,"\\D+"))

Upvotes: 0

GKi
GKi

Reputation: 39647

You can remove all unneeded characters e.g. with gsub and then use strsplit or read.csv. In base it would look like:

string <- c("Y: 10 ,W: 3 , cp: 0.05", "Y: 10 ,W: 3 , cp: 0.05")
read.csv(text=gsub("[[:alpha:]: ]", "", string), header=FALSE)
#  V1 V2   V3
#1 10  3 0.05
#2 10  3 0.05

#or with strsplit
strsplit(gsub("[[:alpha:]: ]", "", string), ",")

Upvotes: 1

Nick
Nick

Reputation: 349

Given that your text strings are uniform it should be relatively simple to do, the first part would look like this:

txt <- c(
  "Y: 10 ,W: 3 , cp: 0.05",
  "Y: 6 ,W: 7 , cp: 0.08",
  "Y: 5 ,W: 0 , cp: 0.08"
)

x <- do.call(rbind, strsplit(txt, split = " ,"))

And that would get a matrix of your "label: value"

library(stringr)
y <- matrix(data = str_extract(string = x,
                               pattern = "([0-9.]+)"),
            ncol = ncol(x))

Will get you to text strings that signify your values, if you want, you can just use str_extract() without the matrix call to get your values as a vector, and:

z <- matrix(data = as.numeric(y),
            ncol = ncol(x))

will get you your matrix as numerics, which it sounds like is what you're interested in.

all together it's fairly tidy, and without the intermediate matrix call, if don't need that, it would look like:

library(stringr)
txt <- c(
  "Y: 10 ,W: 3 , cp: 0.05",
  "Y: 6 ,W: 7 , cp: 0.08",
  "Y: 5 ,W: 0 , cp: 0.08"
)

x <- do.call(rbind, strsplit(txt, split = " ,"))
y <- str_extract(string = x,
                 pattern = "([0-9.]+)")
z <- matrix(data = as.numeric(y),
            ncol = ncol(x))

With z giving you a matrix of numerics.

Upvotes: 0

thelatemail
thelatemail

Reputation: 93803

I find sometimes reformatting name: value pair data back to an existing structure helps to take care of complexity. In this case, I've formatted to a JSON object, and then used stream_in from jsonlite to deal with the data.

This is nice because it will automatically name the columns, and also takes care of occasions when not every value is represented in every row, or the order changes. E.g.:

txt <- c(
  "Y: 10 ,W: 3 , cp: 0.05",
  "Y: 6 ,W: 7 , cp: 0.08",
  "cp: 0.08, Y: 6 "
)

library(jsonlite)
proctxt <- paste("{", gsub("([A-Za-z]+?):", '"\\1":', txt), "}")
stream_in(textConnection(proctxt))
# Found 3 records...
# Imported 3 records. Simplifying...
#   Y  W   cp
#1 10  3 0.05
#2  6  7 0.08
#3  6 NA 0.08

Upvotes: 1

TimTeaFan
TimTeaFan

Reputation: 18541

There are definitely nicer ways, but this should do the job:

Now updated for string vector with more than one element and bringing it into a matrix with three named columns. Should work on vectors of any length.

library(stringr)

string <- c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")


vec <- t(str_split(str_split(string, " ,", simplify = TRUE), ": ", simplify = TRUE)[,2])

mtx = matrix( 
  vec, 
nrow = length(vec)/3, 
ncol = 3) 

colnames(mtx) <- c("Y","W","cp")

mtx

Upvotes: 4

MrNetherlands
MrNetherlands

Reputation: 940

Maybe not the most elegant way but it works:

library(dplyr)
library(stringr)
library(tidyr)
tibble(row = c(1,2), 
       col = c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")) %>%
  separate(col, into=c("col1", "col2", "col3"), sep = ",") %>%
  gather(id, col, -row) %>%
  select(-id) %>%
  mutate(col = str_trim(col)) %>%
  separate(col, into=c("letter", "number"), sep=":") %>%
  mutate(number = str_trim(number)) %>%
  spread(letter, number) %>%
  select(-row)

# A tibble: 2 x 3
  cp    W     Y    
  <chr> <chr> <chr>
1 0.05  3     10   
2 2.2   9     4   

Note that I had to add a new column named row to your data frame to make this approach work

Upvotes: 1

Related Questions