Reputation: 141
How do I extract only 'Reason for the call' and 'Resolution' key-value pair from the column into two separate column in R
> dput(head(asummy, 10))
structure(list(`arrange_df[, 12]` = c("Customer Name: JOE\r\nReason for the Call: BP Set Up\r\nResolution: \r\nvisual audit on the account\r\nset expectations in BP\r\noffered call back\r\nCenter Location: Clark\r\nCTN (Number Calling About): ************************************\r\nAVAYA (Number Calling From): ************************************\r\nAgent UID: cm***************************u\r\n",
"Name: Kim Putok\r\nCtn: ************************************\r\nReason for calling: lost phone/ ************************************/ follow up on insurance claim/ routed to asurion\r\nResolution:\r\nLocation: Clark\r\nattuid: gb***************************r\r\n",
"Customer Name: Heather \r\nReason for the Call: got suspended / card issue / supposedly paid / complains she just updated online her card then got susp\r\nResolution: \r\nCenter Location: Clark\r\nCTN: ************************************\r\nAlt No: ************************************\r\nCredit Reason: ********* courtesy \r\nAgent UID: rc***************************w/I-EQR******************G\r\n",
"Customer Name: GLORIA ; CTN: ************************************ Affected CTN: ************************************ Alternate Number: none Reason for the Call: change rate plan Resolution: set exp on changing rate plan Agent UID: gt***************************g",
"Customer Name: BRANDY \r\nReason for the Call: payment\r\nRecommendations/Troubleshooting Steps:\r\nResolution: explained card errors\r\nasked for alt # but no good\r\noffered to use different card or refill card\r\nsent qp link\r\nshe will go to bank to check as per cx\r\nCenter Location:\r\nCTN: ************************************\r\nCredit Reason (If credit was applied to account):\r\nAgent UID: jq***************************n\r\n",
"Customer Name: ; ;Leahi ; Reason for the Call: ; ; ;billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions ; Resolution: ; ;explained bill / ctn change info provided ; Center Location: ;Clark CTN: ; ;************************************ Alt No: Credit Reason: Agent UID: rc***************************w/I-K*********GMDR",
"Customer Name: MICHAEL\r\nReason for the Call: ACCOUNT BAL INQ\r\n\r\nRecommendations/Troubleshooting Steps:\r\n*Account verified and provided information\r\n\r\nResolution: \r\n*calling about the account status\r\n*suspended due to BBP\r\n*adv about BP policy\r\n*Provided payment options\r\n\r\nCenter Location:Clark\r\nCTN************************************:\r\nCredit Reason (If credit was applied to account):\r\nAgent UID:rc*********\r\n",
"Customer Name: Kimberly\r\nCTN: ************************************\r\nAffected CTN:************************************\r\nAlternate Number: none\r\nReason for the Call: add mhs otc $******************\r\nResolution: added mhs otc $******************\r\nAgent UID: gt***************************g\r\n",
"auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call\r\n",
"Customer Name: TERRELL ; Reason for the Call: payment Recommendations/Troubleshooting Steps: Resolution: payment success test and validated CTN: ************************************ Credit Reason (If credit was applied to account): Agent UID: jq***************************n"
)), row.names = c(NA, 10L), class = "data.frame")
Upvotes: 0
Views: 145
Reputation: 160687
I would label this as seemingly-malicious data: there are several inconsistencies that reduce the confidence of parsing it correctly. After spending some time on it, I'm doing what I can. Ultimately, you should go to the source of this and either (a) ask why they hate you, or (b) repair the scraping script so it is less naive about these things. (Perhaps neither of these are possible, but I thought I'd give you some ideas.) Examples of some issues:
many lines appear to be delimited by \r\n
:
Customer Name: JOE\r\nReason for the Call: BP Set Up\r\n...
and some by ;
Customer Name: GLORIA ; CTN: ...
but some of those same ;
-delimited lines have no other ;
Customer Name: GLORIA ; CTN: ************************************ Affected CTN: ...
some appear to have multiple semicolons for no good reason,
Customer Name: ; ;Leahi ; Reason for the Call: ; ; ;billing issue ...
one appears to be isolated (no keys), perhaps it was meant to be concatenated from a field on a previous line?
auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call\r\n
there appears to be a missing colon on one line between CTN
and its redacted value:
...\r\nCenter Location:Clark\r\nCTN************************************:\r\n
fun <- function(z) {
z <- trimws(z)
gre <- gregexpr("\\b([A-Z][^:]+:)", z)
lhs <- lapply(regmatches(z, gre), trimws)
rhs <- lapply(regmatches(z, gre, invert = TRUE), trimws)
rhs <- lapply(rhs, function(R) R[if (!nzchar(R[1])) -1 else TRUE])
z <- sapply(Map(paste, lhs, rhs), paste, collapse = "\r\n")
if (!grepl(":", z)) z <- paste("UNK:", z)
with(list(r = strsplit(z, "[\r\n;]+")[[1]]),
sapply(split(r, cumsum(grepl(":", r))), paste, collapse = "\r\n"))
}
library(dplyr)
library(tidyr) # unnest, separate
dat %>%
transmute(row = row_number(), L = lapply(`arrange_df[, 12]`, fun)) %>%
unnest(L) %>%
separate(L, sep = ":", into = c("lhs", "rhs"), fill = "right") %>%
mutate(across(c(lhs, rhs), trimws)) %>%
as.data.frame() # purely so you can see all of the data here, not required
# row lhs rhs
# 1 1 Customer Name JOE
# 2 1 Reason for the Call BP Set Up
# 3 1 Resolution visual audit on the account\r\nset expectations in\r\nBP\r\noffered call back
# 4 1 Center Location Clark
# 5 1 CTN (Number Calling About) ************************************
# 6 1 AVAYA (Number Calling From) ************************************
# 7 1 Agent UID cm***************************u
# 8 2 Name Kim Putok
# 9 2 Ctn ************************************
# 10 2 Reason for calling lost phone/ ************************************/ follow up on insurance claim/ routed to asurion
# 11 2 Resolution
# 12 2 Location Clark
# 13 2 attuid gb***************************r
# 14 3 Customer Name Heather
# 15 3 Reason for the Call got suspended / card issue / supposedly paid / complains she just updated online her card then got susp
# 16 3 Resolution
# 17 3 Center Location Clark
# 18 3 CTN ************************************
# 19 3 Alt No ************************************
# 20 3 Credit Reason ********* courtesy
# 21 3 Agent UID rc***************************w/I-EQR******************G
# 22 4 Customer Name GLORIA
# 23 4 CTN ************************************
# 24 4 Affected CTN ************************************
# 25 4 Alternate Number none
# 26 4 Reason for the Call change rate plan
# 27 4 Resolution set exp on changing rate plan
# 28 4 Agent UID gt***************************g
# 29 5 Customer Name BRANDY
# 30 5 Reason for the Call payment
# 31 5 Recommendations/Troubleshooting Steps
# 32 5 Resolution explained card errors\r\nasked for alt # but no good\r\noffered to use different card or refill card\r\nsent qp link\r\nshe will go to bank to check as per cx
# 33 5 Center Location
# 34 5 CTN ************************************
# 35 5 Credit Reason (If credit was applied to account)
# 36 5 Agent UID jq***************************n
# 37 6 Customer Name Leahi
# 38 6 Reason for the Call billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions
# 39 6 Resolution explained bill / ctn change info provided
# 40 6 Center Location
# 41 6 Clark CTN ************************************
# 42 6 Alt No
# 43 6 Credit Reason
# 44 6 Agent UID rc***************************w/I-K*********GMDR
# 45 7 Customer Name MICHAEL
# 46 7 Reason for the Call ACCOUNT BAL INQ
# 47 7 Recommendations/Troubleshooting Steps *\r\nAccount verified and provided information
# 48 7 Resolution *calling about the account status\r\n*suspended due to\r\nBBP\r\n*adv about BP policy\r\n*Provided payment options
# 49 7 Center Location Clark
# 50 7 CTN************************************
# 51 7 Credit Reason (If credit was applied to account)
# 52 7 Agent UID rc*********
# 53 8 Customer Name Kimberly
# 54 8 CTN ************************************
# 55 8 Affected CTN ************************************
# 56 8 Alternate Number none
# 57 8 Reason for the Call add mhs otc $******************
# 58 8 Resolution added mhs otc $******************
# 59 8 Agent UID gt***************************g
# 60 9 UNK auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call
# 61 10 Customer Name TERRELL
# 62 10 Reason for the Call payment
# 63 10 Recommendations/Troubleshooting Steps
# 64 10 Resolution payment success test and validated
# 65 10 CTN ************************************
# 66 10 Credit Reason (If credit was applied to account)
# 67 10 Agent UID jq***************************n
Upvotes: 2