Reputation: 115
I have a string:
"mary g: hello dr ydeen how can i help you dany: hi is there a place within the physician portal to see how many vacation and sick hours i currently have thank you so much for your time mary g: yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary mary g: you can also view it on each pay stub biweekly dany: i thought it might be on the paystub i can never seem to find it there : checking out benefits summary now dany: great i see vacation and sick time here to confirm this is how much i have currently accrued mary g: youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings mary g: great yes thats correct mary g: its your accruals as of the last pay period mary g: so currently its accurate up until dany: perfect thank you so much and noted regarding printview dany: thanks so much for your help mary g: my pleasure anything else i can help you with dany: thats all thanks again mary g: have a great day"
I want to extract all chat written by mary g, I tried below regex but its only giving first line string
text_agnt=gsub(".*^[[:alpha:]][^:]+:\\s*|\\s[A-Za-z]{3,10}:.*$","",text)
Output: "hello dr ydeen how can i help you "
Expected output: all text written by mary g
Upvotes: 2
Views: 66
Reputation: 163427
In the example data, there are more than 2 whitespace chars before the name and the colon. In that case you can use a positive lookahead assertion to check what follows is either 2 whitespace chars followed by a :
with at least a single [A-Za-z]
before it, or the end of the string.
\bmary\s+g:\s*(.*?)(?=\s{2}[^:]*[A-Za-z]:|$)
library(stringr)
txt <- "mary g: hello dr ydeen how can i help you dany: hi is there a place within the physician portal to see how many vacation and sick hours i currently have thank you so much for your time mary g: yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary mary g: you can also view it on each pay stub biweekly dany: i thought it might be on the paystub i can never seem to find it there : checking out benefits summary now dany: great i see vacation and sick time here to confirm this is how much i have currently accrued mary g: youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings mary g: great yes thats correct mary g: its your accruals as of the last pay period mary g: so currently its accurate up until dany: perfect thank you so much and noted regarding printview dany: thanks so much for your help mary g: my pleasure anything else i can help you with dany: thats all thanks again mary g: have a great day"
pattern <- "\\bmary\\s+g:\\s*(.*?)(?=\\s{2}[^:]*[A-Za-z]:|$)"
str_match_all(txt, pattern)[[1]][,2]
Output
[1] "hello dr ydeen how can i help you"
[2] "yes you can view that in your secure hr page under benefits summary first youll have to enter the last four of your ssn on the right hand side youll be directed to your secure hr page and on the left hand side there will be a list of links select benefits summary"
[3] "you can also view it on each pay stub biweekly"
[4] "youll have to select the quotprintviewquot option on the paycheck and the accruals are listed under your earnings"
[5] "great yes thats correct"
[6] "its your accruals as of the last pay period"
[7] "so currently its accurate up until"
[8] "my pleasure anything else i can help you with"
[9] "have a great day"
Upvotes: 3