Reputation: 3
I have a dataframe in R as follows.
test <- data.frame("FRUITSTRING" = c("APPLE_PEAR_BANANA",
"TURNIP_CABBAGE_ORANGE_PEAR_BANANA",
"APPLE_CARROT_PEAR_BANANA"),
"SPLIT_CHAR" = c("PEAR","ORANGE","PEAR"))
I wish to split the column FRUITSTRING into two columns but make it split on a row by row basis dependent on the value of the 2nd column called SPLIT_CHAR. Is it possible to do this? Note The string length can change and the position of the split character can change and this is why I want to call a particular character in order to do the split.
The function I have used previously was cSplit however I no idea how to pass this dataframe into cSplit and to use the valve of another column as the input to csplit. Thanks
Upvotes: 0
Views: 58
Reputation: 269905
1) dplyr/stsringr/tidyr Replace the SPLIT_CHAR string and the surrounding _ with semicolon and then separate on semicolon.
library(dplyr)
library(stringr)
library(tidyr)
test %>%
mutate(FRUITSTRING = str_replace(FRUITSTRING, str_c("_", SPLIT_CHAR, "_"), ";")) %>%
separate(FRUITSTRING, c("prefix", "suffix"), sep = ";")
## prefix suffix SPLIT_CHAR
## 1 APPLE BANANA PEAR
## 2 TURNIP_CABBAGE PEAR_BANANA ORANGE
## 3 APPLE_CARROT BANANA PEAR
2) Base R - transform/sub or using base R. Extract the prefix and separately extract the suffix using sub. Because we need a vectorized version of sub defined that at the beginning. Omit the last argument of transform if FRUITSTRING is to be retained.
vsub <- Vectorize(sub)
transform(test,
prefix = vsub(paste0("_", SPLIT_CHAR, "_.*"), "", FRUITSTRING),
suffix = vsub(paste0(".*_", SPLIT_CHAR, "_"), "", FRUITSTRING),
FRUITSTRING = NULL)
## SPLIT_CHAR prefix suffix
## 1 PEAR APPLE BANANA
## 2 ORANGE TURNIP_CABBAGE PEAR_BANANA
## 3 PEAR APPLE_CARROT BANANA
2a) within/sub or the same but using within and a slightly different regex pattern so that we can use the same one for both instances of sub.
vsub <- Vectorize(sub)
within(test, {
pat <- paste0("(.*)_", SPLIT_CHAR, "_(.*)")
suffix <- vsub(pat, "\\2", FRUITSTRING)
prefix <- vsub(pat, "\\1", FRUITSTRING)
FRUITSTRING <- pat <- NULL
})
## SPLIT_CHAR prefix suffix
## 1 PEAR APPLE BANANA
## 2 ORANGE TURNIP_CABBAGE PEAR_BANANA
## 3 PEAR APPLE_CARROT BANANA
3) cSplit As in (1) replace the SPLIT_CHAR string and the surrounding _ with semicolon and then split on semicolon.
library(splitstackshape)
test |>
transform(FRUITSTRING =
Vectorize(sub)(paste0("_", SPLIT_CHAR, "_"), ";", FRUITSTRING)) |>
cSplit("FRUITSTRING", sep = ";", type.convert = FALSE)
## SPLIT_CHAR FRUITSTRING_1 FRUITSTRING_2
## 1: PEAR APPLE BANANA
## 2: ORANGE TURNIP_CABBAGE PEAR_BANANA
## 3: PEAR APPLE_CARROT BANANA
Upvotes: 1