Reputation: 353
Using column RelatedToText
below, I want to create 2 new columns Coverage_Type
and Name
.
If we can find content before and after last "-" sign, then I think I should be good. But then, if you see the last case, there is a "-" sign between parts of a name i.e. between Mayur and Cook.
My questions are 2 fold : first, how should I extract content before and after the last "-" sign?, and second, how should I extract content correctly if name contains a dash within itself as quoted above?
RelatedToTxt Coverage_Type Name
Collision - NAWADA REALTY, INC Collision NAWADA REALTY, INC
Collision - Don Cooks Collision Don Cooks
Pro Dam - Veh - Spl Lt - Raj Perk Pro Dam - Veh - Spl Lt Raj Perk
Rental Reimbursement - Mayur-Cook Rental Reimbursement Mayur-Cook
Example data:
RelatedToTxt <- c("Collision - NAWADA REALTY, INC", "Collision - Don Cooks",
"Pro Dam - Veh - Spl Lt - Raj Perk", "Rental Reimbursement - Mayur-Cook")
Upvotes: 0
Views: 904
Reputation: 17611
Try using strsplit
to split the text into two columns. You can split on the final " - "
using this regex: .+\\K\\s-\\s
. The .+\\K
uses a greedy pattern .+
to match as much as it can and then drop what has been match, using \\K
, before matching a space-hyphen-space pattern. The greediness of .+
allows it to skip over the hyphens in "Pro Dam - Veh - Spl Lt".
strsplit(RelatedToTxt, ".+\\K\\s-\\s", perl = TRUE)
#[[1]]
#[1] "Collision" "NAWADA REALTY, INC"
#
#[[2]]
#[1] "Collision" "Don Cooks"
#
#[[3]]
#[1] "Pro Dam - Veh - Spl Lt" "Raj Perk"
#
#[[4]]
#[1] "Rental Reimbursement" "Mayur-Cook"
The output can be turned into two columns with
do.call(rbind, strsplit(RelatedToTxt, ".+\\K\\s-\\s", perl = TRUE))
Upvotes: 1