Reputation: 43
Consider the below sample dataframe;
Sample DataFrame
| outcome
|
----------------------------------------------------------
| Stewart Young, CEO, ABC Corp; Mark Hill, CFO, DCB Corp |
| Hill Man, Executive, FC Bank
|
I would like to perform a string split based on ;
and then split at ,
. For example I would need the outcome provided below:
| Name | Role | Company |
-----------------------------------------------------------
| Stewart Young, Mark Hill| CEO, CFO | ABC Corp, DCB Corp |
| Hill Man | Executive| FC Bank |
I can perform a string split at ;
, but I am unable to obtain exact names, roles and company.
Upvotes: 1
Views: 235
Reputation: 522007
Here is a base R solution which seems to be working:
output <- apply(df, 1, function(r) {
parts <- unlist(strsplit(r, ";\\s*"))
Name <- sapply(parts, function(x) { strsplit(x, ",\\s*")[[1]][1] })
Name <- paste(Name, collapse=", ")
Role <- sapply(parts, function(x) { strsplit(x, ",\\s*")[[1]][2] })
Role <- paste(Role, collapse=", ")
Company <- sapply(parts, function(x) { strsplit(x, ",\\s*")[[1]][3] })
Company <- paste(Company, collapse=",")
c(Name, Role, Company)
})
output <- data.frame(t(output))
output
X1 X2 X3
1 Stewart Young, Mark Hill CEO, CFO ABC Corp,DCB Corp
2 Hill Man Executive FC Bank
Data:
df <- data.frame(outcome=c("Stewart Young, CEO, ABC Corp; Mark Hill, CFO, DCB Corp",
"Hill Man, Executive, FC Bank"),
stringsAsFactors=FALSE)
Upvotes: 3
Reputation: 389135
I would suggest to keep every entry in separate row instead of merging them together. Using dplyr
and tidyr
one way would be
library(dplyr)
library(tidyr)
df %>%
separate_rows(outcome, sep = ";") %>%
separate(outcome, c("Name", "Role", "Company"), sep = ", ")
# Name Role Company
#1 Stewart Young CEO ABC Corp
#2 Mark Hill CFO DCB Corp
#3 Hill Man Executive FC Bank
However, if you need output as shown we can do
df %>%
mutate(row = row_number()) %>%
separate_rows(outcome, sep = ";") %>%
separate(outcome, c("Name", "Role", "Company"), sep = ", ") %>%
group_by(row) %>%
summarise_all(toString) %>%
select(-row)
# Name Role Company
# <chr> <chr> <chr>
#1 Stewart Young, Mark Hill CEO, CFO ABC Corp, DCB Corp
#2 Hill Man Executive FC Bank
data
df <- structure(list(outcome = structure(2:1, .Label = c("Hill Man,
Executive, FC Bank", "Stewart Young, CEO, ABC Corp; Mark Hill, CFO, DCB Corp"),
class = "factor")), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 2