Rich
Rich

Reputation: 141

How do I extract key-value pair from column in a dataframe in R?

How do I extract only 'Reason for the call' and 'Resolution' key-value pair from the column into two separate column in R

> dput(head(asummy, 10))
structure(list(`arrange_df[, 12]` = c("Customer Name: JOE\r\nReason for the Call: BP Set Up\r\nResolution: \r\nvisual audit on the account\r\nset expectations in BP\r\noffered call back\r\nCenter Location: Clark\r\nCTN (Number Calling About): ************************************\r\nAVAYA (Number Calling From): ************************************\r\nAgent UID: cm***************************u\r\n", 
"Name: Kim Putok\r\nCtn: ************************************\r\nReason for calling: lost phone/ ************************************/ follow up on insurance claim/ routed to asurion\r\nResolution:\r\nLocation: Clark\r\nattuid: gb***************************r\r\n", 
"Customer Name: Heather \r\nReason for the Call:    got suspended / card issue / supposedly paid / complains she just updated online her card then got susp\r\nResolution:  \r\nCenter Location: Clark\r\nCTN:  ************************************\r\nAlt No:  ************************************\r\nCredit Reason:  ********* courtesy \r\nAgent UID: rc***************************w/I-EQR******************G\r\n", 
"Customer Name: GLORIA ;      CTN: ************************************      Affected CTN: ************************************      Alternate Number: none      Reason for the Call: change rate plan      Resolution: set exp on changing rate plan      Agent UID: gt***************************g", 
"Customer Name: BRANDY \r\nReason for the Call: payment\r\nRecommendations/Troubleshooting Steps:\r\nResolution: explained card errors\r\nasked for alt # but no good\r\noffered to use different card or refill card\r\nsent qp link\r\nshe will go to bank to check as per cx\r\nCenter Location:\r\nCTN: ************************************\r\nCredit Reason (If credit was applied to account):\r\nAgent UID: jq***************************n\r\n", 
"Customer Name: ; ;Leahi ;  Reason for the Call: ; ; ;billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions ;  Resolution: ; ;explained bill / ctn change info provided ;  Center Location: ;Clark  CTN: ; ;************************************  Alt No:  Credit Reason:  Agent UID: rc***************************w/I-K*********GMDR", 
"Customer Name: MICHAEL\r\nReason for the Call: ACCOUNT BAL INQ\r\n\r\nRecommendations/Troubleshooting Steps:\r\n*Account verified and provided information\r\n\r\nResolution: \r\n*calling about the account status\r\n*suspended due to BBP\r\n*adv about BP policy\r\n*Provided payment options\r\n\r\nCenter Location:Clark\r\nCTN************************************:\r\nCredit Reason (If credit was applied to account):\r\nAgent UID:rc*********\r\n", 
"Customer Name: Kimberly\r\nCTN: ************************************\r\nAffected CTN:************************************\r\nAlternate Number: none\r\nReason for the Call: add mhs otc $******************\r\nResolution: added mhs otc  $******************\r\nAgent UID: gt***************************g\r\n", 
"auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call\r\n", 
"Customer Name: TERRELL ;  Reason for the Call: payment  Recommendations/Troubleshooting Steps:  Resolution: payment success  test and validated  CTN: ************************************  Credit Reason (If credit was applied to account):  Agent UID: jq***************************n"
)), row.names = c(NA, 10L), class = "data.frame")

Upvotes: 0

Views: 145

Answers (1)

r2evans
r2evans

Reputation: 160687

I would label this as seemingly-malicious data: there are several inconsistencies that reduce the confidence of parsing it correctly. After spending some time on it, I'm doing what I can. Ultimately, you should go to the source of this and either (a) ask why they hate you, or (b) repair the scraping script so it is less naive about these things. (Perhaps neither of these are possible, but I thought I'd give you some ideas.) Examples of some issues:

  • many lines appear to be delimited by \r\n:

    Customer Name: JOE\r\nReason for the Call: BP Set Up\r\n...
    

    and some by ;

    Customer Name: GLORIA ;      CTN: ...
    

    but some of those same ;-delimited lines have no other ;

    Customer Name: GLORIA ;      CTN: ************************************      Affected CTN: ...
    
  • some appear to have multiple semicolons for no good reason,

    Customer Name: ; ;Leahi ;  Reason for the Call: ; ; ;billing issue ...
    
  • one appears to be isolated (no keys), perhaps it was meant to be concatenated from a field on a previous line?

    auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call\r\n
    
  • there appears to be a missing colon on one line between CTN and its redacted value:

    ...\r\nCenter Location:Clark\r\nCTN************************************:\r\n
    
fun <- function(z) {
  z <- trimws(z)
  gre <- gregexpr("\\b([A-Z][^:]+:)", z)
  lhs <- lapply(regmatches(z, gre), trimws)
  rhs <- lapply(regmatches(z, gre, invert = TRUE), trimws)
  rhs <- lapply(rhs, function(R) R[if (!nzchar(R[1])) -1 else TRUE])
  z <- sapply(Map(paste, lhs, rhs), paste, collapse = "\r\n")
  if (!grepl(":", z)) z <- paste("UNK:", z)
  with(list(r = strsplit(z, "[\r\n;]+")[[1]]),
       sapply(split(r, cumsum(grepl(":", r))), paste, collapse = "\r\n"))
}

library(dplyr)
library(tidyr) # unnest, separate

dat %>%
  transmute(row = row_number(), L = lapply(`arrange_df[, 12]`, fun)) %>%
  unnest(L) %>%
  separate(L, sep = ":", into = c("lhs", "rhs"), fill = "right") %>%
  mutate(across(c(lhs, rhs), trimws)) %>%
  as.data.frame()                      # purely so you can see all of the data here, not required
#    row                                              lhs                                                                                                                                                               rhs
# 1    1                                    Customer Name                                                                                                                                                               JOE
# 2    1                              Reason for the Call                                                                                                                                                         BP Set Up
# 3    1                                       Resolution                                                                                     visual audit on the account\r\nset expectations in\r\nBP\r\noffered call back
# 4    1                                  Center Location                                                                                                                                                             Clark
# 5    1                       CTN (Number Calling About)                                                                                                                              ************************************
# 6    1                      AVAYA (Number Calling From)                                                                                                                              ************************************
# 7    1                                        Agent UID                                                                                                                                    cm***************************u
# 8    2                                             Name                                                                                                                                                         Kim Putok
# 9    2                                              Ctn                                                                                                                              ************************************
# 10   2                               Reason for calling                                                                 lost phone/ ************************************/ follow up on insurance claim/ routed to asurion
# 11   2                                       Resolution                                                                                                                                                                  
# 12   2                                         Location                                                                                                                                                             Clark
# 13   2                                           attuid                                                                                                                                    gb***************************r
# 14   3                                    Customer Name                                                                                                                                                           Heather
# 15   3                              Reason for the Call                                                           got suspended / card issue / supposedly paid / complains she just updated online her card then got susp
# 16   3                                       Resolution                                                                                                                                                                  
# 17   3                                  Center Location                                                                                                                                                             Clark
# 18   3                                              CTN                                                                                                                              ************************************
# 19   3                                           Alt No                                                                                                                              ************************************
# 20   3                                    Credit Reason                                                                                                                                                ********* courtesy
# 21   3                                        Agent UID                                                                                                           rc***************************w/I-EQR******************G
# 22   4                                    Customer Name                                                                                                                                                            GLORIA
# 23   4                                              CTN                                                                                                                              ************************************
# 24   4                                     Affected CTN                                                                                                                              ************************************
# 25   4                                 Alternate Number                                                                                                                                                              none
# 26   4                              Reason for the Call                                                                                                                                                  change rate plan
# 27   4                                       Resolution                                                                                                                                     set exp on changing rate plan
# 28   4                                        Agent UID                                                                                                                                    gt***************************g
# 29   5                                    Customer Name                                                                                                                                                            BRANDY
# 30   5                              Reason for the Call                                                                                                                                                           payment
# 31   5            Recommendations/Troubleshooting Steps                                                                                                                                                                  
# 32   5                                       Resolution    explained card errors\r\nasked for alt # but no good\r\noffered to use different card or refill card\r\nsent qp link\r\nshe will go to bank to check as per cx
# 33   5                                  Center Location                                                                                                                                                                  
# 34   5                                              CTN                                                                                                                              ************************************
# 35   5 Credit Reason (If credit was applied to account)                                                                                                                                                                  
# 36   5                                        Agent UID                                                                                                                                    jq***************************n
# 37   6                                    Customer Name                                                                                                                                                             Leahi
# 38   6                              Reason for the Call billing issue / ******************.****************** dollars only orig. bill / due date shld be *********th / why ****************** now? / ctn change questions
# 39   6                                       Resolution                                                                                                                         explained bill / ctn change info provided
# 40   6                                  Center Location                                                                                                                                                                  
# 41   6                                       Clark  CTN                                                                                                                              ************************************
# 42   6                                           Alt No                                                                                                                                                                  
# 43   6                                    Credit Reason                                                                                                                                                                  
# 44   6                                        Agent UID                                                                                                                   rc***************************w/I-K*********GMDR
# 45   7                                    Customer Name                                                                                                                                                           MICHAEL
# 46   7                              Reason for the Call                                                                                                                                                   ACCOUNT BAL INQ
# 47   7            Recommendations/Troubleshooting Steps                                                                                                                    *\r\nAccount verified and provided information
# 48   7                                       Resolution                                                *calling about the account status\r\n*suspended due to\r\nBBP\r\n*adv about BP policy\r\n*Provided payment options
# 49   7                                  Center Location                                                                                                                                                             Clark
# 50   7          CTN************************************                                                                                                                                                                  
# 51   7 Credit Reason (If credit was applied to account)                                                                                                                                                                  
# 52   7                                        Agent UID                                                                                                                                                       rc*********
# 53   8                                    Customer Name                                                                                                                                                          Kimberly
# 54   8                                              CTN                                                                                                                              ************************************
# 55   8                                     Affected CTN                                                                                                                              ************************************
# 56   8                                 Alternate Number                                                                                                                                                              none
# 57   8                              Reason for the Call                                                                                                                                   add mhs otc $******************
# 58   8                                       Resolution                                                                                                                                added mhs otc  $******************
# 59   8                                        Agent UID                                                                                                                                    gt***************************g
# 60   9                                              UNK                                                                                             auto pop / suspended line\r\n-cx speaking spanish\r\n-transfered call
# 61  10                                    Customer Name                                                                                                                                                           TERRELL
# 62  10                              Reason for the Call                                                                                                                                                           payment
# 63  10            Recommendations/Troubleshooting Steps                                                                                                                                                                  
# 64  10                                       Resolution                                                                                                                               payment success  test and validated
# 65  10                                              CTN                                                                                                                              ************************************
# 66  10 Credit Reason (If credit was applied to account)                                                                                                                                                                  
# 67  10                                        Agent UID                                                                                                                                    jq***************************n

Upvotes: 2

Related Questions