Nishant
Nishant

Reputation: 115

regular expression to segregate chat between two person in R

I have a string:

"mary g: hello dr ydeen how can i help you dany: hi is there a place within the physician portal to see how many vacation and sick hours i currently have thank you so much for your time mary g: yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary mary g: you can also view it on each pay stub biweekly dany: i thought it might be on the paystub i can never seem to find it there : checking out benefits summary now dany: great i see vacation and sick time here to confirm this is how much i have currently accrued mary g: youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings mary g: great yes thats correct mary g: its your accruals as of the last pay period mary g: so currently its accurate up until dany: perfect thank you so much and noted regarding printview dany: thanks so much for your help mary g: my pleasure anything else i can help you with dany: thats all thanks again mary g: have a great day"

I want to extract all chat written by mary g, I tried below regex but its only giving first line string

text_agnt=gsub(".*^[[:alpha:]][^:]+:\\s*|\\s[A-Za-z]{3,10}:.*$","",text)

Output: "hello dr ydeen how can i help you "

Expected output: all text written by mary g

Upvotes: 2

Views: 66

Answers (1)

The fourth bird
The fourth bird

Reputation: 163427

In the example data, there are more than 2 whitespace chars before the name and the colon. In that case you can use a positive lookahead assertion to check what follows is either 2 whitespace chars followed by a : with at least a single [A-Za-z] before it, or the end of the string.

\bmary\s+g:\s*(.*?)(?=\s{2}[^:]*[A-Za-z]:|$)

Regex demo

library(stringr)

txt <- "mary g: hello dr ydeen how can i help you   dany: hi is there a place within the physician portal to see how many vacation and sick hours i currently have thank you so much for your time    mary g: yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary    mary g: you can also view it on each pay stub biweekly    dany: i thought it might be on the paystub i can never seem to find it there : checking out benefits summary now    dany: great i see vacation and sick time here to confirm this is how much i have currently accrued    mary g: youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings    mary g: great yes thats correct    mary g: its your accruals as of the last pay period    mary g: so currently its accurate up until     dany: perfect thank you so much and noted regarding printview    dany: thanks so much for your help    mary g: my pleasure anything else i can help you with    dany: thats all thanks again    mary g: have a great day"
pattern <- "\\bmary\\s+g:\\s*(.*?)(?=\\s{2}[^:]*[A-Za-z]:|$)"
str_match_all(txt, pattern)[[1]][,2]

Output

[1] "hello dr ydeen how can i help you"                                                                                                                                                                                                                                      
[2] "yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary"
[3] "you can also view it on each pay stub biweekly"                                                                                                                                                                                                                         
[4] "youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings"                                                                                                                                                      
[5] "great yes thats correct"                                                                                                                                                                                                                                                
[6] "its your accruals as of the last pay period"                                                                                                                                                                                                                            
[7] "so currently its accurate up until"                                                                                                                                                                                                                                     
[8] "my pleasure anything else i can help you with"                                                                                                                                                                                                                          
[9] "have a great day"

Upvotes: 3

Related Questions