Reputation: 11

Splitting and extracting Strings in R

RULES

{Denny Frying Pan} => {Denny C-Size Batteries}

{Denny Scented Tissue} => {Denny Paper Plates}

{Blue Label Fancy Canned Clams} => {Blue Label Canned Tuna in Water}

{Denny Plastic Forks} => {Golden Frozen Peas}

{Denny Frying Pan} => {Denny D-Size Batteries}

{Denny Plastic Forks} => {Faux Products Apricot Shampoo}

{Golden Frozen Peas} => {Denny Plastic Forks}

{Faux Products Apricot Shampoo} => {Denny Plastic Forks}

{Blue Label Canned Tuna in Water} => {Blue Label Fancy Canned Clams}

{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}

{Denny D-Size Batteries} => {Denny Frying Pan}

I have a data frame with a single column as above. I want to split the above rules into LHS and RHS

LHS Should contain the Characters which is enclosed between {} before => and similarly RHS should contain Characters enclosed between the next {} which is after the =>

I would like to know how this can be done in R?

Upvotes: 0

Answers (3)

Tyler Rinker

Reputation: 109844

Here's an approach with qdapRegex that I maintain:

RULES <- c("{Denny Frying Pan} => {Denny C-Size Batteries}",
           "{Denny Scented Tissue} => {Denny Paper Plates}",
           "{Blue Label Fancy Canned Clams} => {Blue Label Canned Tuna in Water}",
           "{Denny Plastic Forks} => {Golden Frozen Peas}",
           "{Denny Frying Pan} => {Denny D-Size Batteries}",
           "{Denny Plastic Forks} => {Faux Products Apricot Shampoo}",
           "{Golden Frozen Peas} => {Denny Plastic Forks}",
           "{Faux Products Apricot Shampoo} => {Denny Plastic Forks}",
           "{Blue Label Canned Tuna in Water} => {Blue Label Fancy Canned Clams}",
           "{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}",
           "{Denny D-Size Batteries} => {Denny Frying Pan}")

library(qdapRegex)
setNames(do.call(rbind.data.frame, rm_curly(RULES, extract=TRUE)), c("LHS", "RHS"))

##                                LHS                             RHS
## 1                 Denny Frying Pan          Denny C-Size Batteries
## 2             Denny Scented Tissue              Denny Paper Plates
## 3    Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
## 4              Denny Plastic Forks              Golden Frozen Peas
## 5                 Denny Frying Pan          Denny D-Size Batteries
## 6              Denny Plastic Forks   Faux Products Apricot Shampoo
## 7               Golden Frozen Peas             Denny Plastic Forks
## 8    Faux Products Apricot Shampoo             Denny Plastic Forks
## 9  Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
## 10  Blue Label Canned String Beans  Faux Products Buffered Aspirin
## 11          Denny D-Size Batteries                Denny Frying Pan

We extract stuff between curly braces and then use do.call + rbind.data.frame to coerce to a data.frame.

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can try one of the following. Both assume that you're starting with a character vector named "rules". If "rules" is already a column in your data.frame, you would need some slight modification.

library(splitstackshape)
library(dplyr)

data.table(rules = gsub("[{}]", "", gsub("=>", "\t", rules))) %>%
  cSplit("rules", "\t")
#                             rules_1                         rules_2
#  1:                Denny Frying Pan          Denny C-Size Batteries
#  2:            Denny Scented Tissue              Denny Paper Plates
#  3:   Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
#  4:             Denny Plastic Forks              Golden Frozen Peas
#  5:                Denny Frying Pan          Denny D-Size Batteries
#  6:             Denny Plastic Forks   Faux Products Apricot Shampoo
#  7:              Golden Frozen Peas             Denny Plastic Forks
#  8:   Faux Products Apricot Shampoo             Denny Plastic Forks
#  9: Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
# 10:  Blue Label Canned String Beans  Faux Products Buffered Aspirin
# 11:          Denny D-Size Batteries                Denny Frying Pan

library(dplyr)
library(tidyr)

data.frame(rules) %>%
  mutate(rules = gsub("\\s+=>\\s+", "=>", rules)) %>%
  mutate(rules = gsub("[{}]", "", rules)) %>%
  separate(rules, into = c("V1", "V2"), sep = "=>")

Upvotes: 0

scoa

Reputation: 19867

RULES <- c("{Denny Frying Pan} => {Denny C-Size Batteries}",
           "{Denny Scented Tissue} => {Denny Paper Plates}",
           "{Blue Label Fancy Canned Clams} => {Blue Label Canned Tuna in Water}",
           "{Denny Plastic Forks} => {Golden Frozen Peas}",
           "{Denny Frying Pan} => {Denny D-Size Batteries}",
           "{Denny Plastic Forks} => {Faux Products Apricot Shampoo}",
           "{Golden Frozen Peas} => {Denny Plastic Forks}",
           "{Faux Products Apricot Shampoo} => {Denny Plastic Forks}",
           "{Blue Label Canned Tuna in Water} => {Blue Label Fancy Canned Clams}",
           "{Blue Label Canned String Beans} => {Faux Products Buffered Aspirin}",
           "{Denny D-Size Batteries} => {Denny Frying Pan}")

df <- as.data.frame(do.call(rbind,strsplit(RULES,"} => {",fixed=TRUE)))
df[,1] <- gsub("{","",df[,1],fixed = TRUE)
df[,2] <- gsub("}","",df[,2],fixed = TRUE)

df
                                V1                              V2
1                 Denny Frying Pan          Denny C-Size Batteries
2             Denny Scented Tissue              Denny Paper Plates
3    Blue Label Fancy Canned Clams Blue Label Canned Tuna in Water
4              Denny Plastic Forks              Golden Frozen Peas
5                 Denny Frying Pan          Denny D-Size Batteries
6              Denny Plastic Forks   Faux Products Apricot Shampoo
7               Golden Frozen Peas             Denny Plastic Forks
8    Faux Products Apricot Shampoo             Denny Plastic Forks
9  Blue Label Canned Tuna in Water   Blue Label Fancy Canned Clams
10  Blue Label Canned String Beans  Faux Products Buffered Aspirin
11          Denny D-Size Batteries                Denny Frying Pan

Upvotes: 1

Splitting and extracting Strings in R

Answers (3)

Related Questions