Reputation: 463
Below is a vector I'm working with. What I am trying to do is extract only the ages (including whether the number is months or years old) from each entry in the vector. I know I have to use str/grep functions and regex, but not sure how to combine functions to get what I want done.
All ages are expressed like this: number time interval sex. So for example: 18MOM is an 18 month old male, 18YOF is 18 year old female etc.
[1] "DX LAC CHIN/ABRASION CHEEK/CONTU HAND(S): 6YOF OUT RIDING BIKE, W WOBBLY ON BIKE AND HIT FACE ON ROAD, ABRASION TO L CHEEK, CHIN & R HAND"
[2] "DX LWOBS: 2YOM L PINKY FINGER CAUGHT IN BOWLING BALL, SM AMT BLDG/SWELLING TO PINKY FINGER. CRUSH W BOWLING BALL"
[3] "DX KNEE SPRAIN/CONTU KNEE/HIGH BLD PRESS: 16YOM R KNEE PN AFTER TWISTING KNEE COMING DOWN F JUMP' DUR' BASKETBALL GAME, LANDED ON BENT KNEE"
[4] "DX LBP: 21YOM STRETCHING OUT AFTER WORKOUT (DOING ***) HEARD POP"
[5] "DX FX PHALANX FOOT: 36YOF STUBBED R GREAT TOE ON STAIRS, PN, SWELL' SUROUNDING R GREAT TOE"
[6] "DX ELBOW CONTU/ELBOW ABRASION: 10YOM FELL F BED HAND HIT R ELBOW ON BEDPLAYING W SISTER, BRUSING TO ELBOW"
[7] "DX LWOBS: 3YOM LAC TO SCALP/ S/P PLASTIC LAMP FELL OFF DRESSER TO HEAD,PT W ~1CM LAC"
[8] "DX CONTU FINGER: 55YOM L 5TH FINGER PN AFTER FALL F BICYCLE W TRYING TOBAL AT STOPPED POSITION"
[9] "DX COSTOCHONDRITIS/CHEST PN: 24YOM SUBSTERNAL CHEST PN W WORKING OUT, HAD SHARP SPASM PN TO SUBSTERNAL CHEST TO L CHEST"
[10] "DX 1ST DEG BURN E: 28YOF W BURN TO L HAND, GRABBED HOT PAN UNDER BROILER W/O POTHOLDER; REDNESS TO PLAM & FINGER TIPS, FEW BLISTERS START' G F"
[11] "DX LWOBS LAC HAND: 1YOM W FINGER INJ, CUT FINGER ON A FAN"
Upvotes: 1
Views: 106
Reputation: 3632
You can first extract all the ages from your text and then compute further analysis.
This code will do the trick (assuming your string vector is called str
):
library(stringr)
ages <- str_extract_all(str, "(\\d{1,2}[MY]O[MF])", simplify = TRUE)
library(stringr)
str <- c("DX LAC CHIN/ABRASION 12YOF CHEEK/CONTU HAND(S): 6YOF OUT RIDING BIKE, W WOBBLY ON BIKE AND HIT FACE ON ROAD, ABRASION TO L CHEEK, CHIN & R HAND",
"DX KNEE SPRAIN/CONTU KNEE/HIGH BLD PRESS: 16YOM R KNEE PN AFTER TWISTING KNEE COMING DOWN F JUMP' DUR' BASKETBALL GAME, LANDED ON BENT KNEE",
"DX FX PHALANX FOOT: 36YOF STUBBED R GREAT TOE ON STAIRS, PN, SWELL' SUROUNDING R GREAT TOE")
str <- paste(str, collapse = '')
ages <- str_extract_all(str, "(\\d{1,2}[MY]O[MF])", simplify = TRUE)
Output:
> ages
[,1] [,2] [,3] [,4]
[1,] "12YOF" "6YOF" "16YOM" "36YOF"
Hope this helps.
Upvotes: 1
Reputation: 7592
agevector<-gsub(".* (\\d*[MY]O).*","\\1",vector)
This will create agevector
which will be a character vector that includes things like 19MO
and 5YO
etc.
It looks for the pattern "[any number of digits] followed by [M or Y] followed by O".
Upvotes: 2