ljh2001
ljh2001

Reputation: 463

How to conduct pattern recognition for strings?

Below is a vector I'm working with. What I am trying to do is extract only the ages (including whether the number is months or years old) from each entry in the vector. I know I have to use str/grep functions and regex, but not sure how to combine functions to get what I want done.

All ages are expressed like this: number time interval sex. So for example: 18MOM is an 18 month old male, 18YOF is 18 year old female etc.

 [1] "DX LAC CHIN/ABRASION CHEEK/CONTU HAND(S): 6YOF OUT RIDING BIKE, W WOBBLY ON BIKE AND HIT FACE ON ROAD, ABRASION TO L CHEEK, CHIN & R HAND"     
 [2] "DX LWOBS: 2YOM L PINKY FINGER CAUGHT IN BOWLING BALL, SM AMT BLDG/SWELLING TO PINKY FINGER. CRUSH W BOWLING BALL"                              
 [3] "DX KNEE SPRAIN/CONTU KNEE/HIGH BLD PRESS: 16YOM R KNEE PN AFTER TWISTING KNEE COMING DOWN F JUMP' DUR' BASKETBALL GAME, LANDED ON BENT KNEE"   
 [4] "DX LBP: 21YOM STRETCHING OUT AFTER WORKOUT (DOING ***) HEARD POP"                                                                              
 [5] "DX FX PHALANX FOOT: 36YOF STUBBED R GREAT TOE ON STAIRS, PN, SWELL'  SUROUNDING R GREAT TOE"                                                   
 [6] "DX ELBOW CONTU/ELBOW ABRASION: 10YOM FELL F BED HAND HIT R ELBOW ON BEDPLAYING W SISTER, BRUSING TO ELBOW"                                     
 [7] "DX LWOBS: 3YOM LAC TO SCALP/ S/P PLASTIC LAMP FELL OFF DRESSER TO HEAD,PT W ~1CM LAC"                                                          
 [8] "DX CONTU FINGER: 55YOM L 5TH FINGER PN AFTER FALL F BICYCLE W TRYING TOBAL AT STOPPED POSITION"                                                
 [9] "DX COSTOCHONDRITIS/CHEST PN: 24YOM SUBSTERNAL CHEST PN W WORKING OUT, HAD SHARP SPASM PN TO SUBSTERNAL CHEST TO L CHEST"                       
[10] "DX 1ST DEG BURN E: 28YOF W BURN TO L HAND, GRABBED HOT PAN UNDER BROILER W/O POTHOLDER; REDNESS TO PLAM & FINGER TIPS, FEW BLISTERS START' G F"
[11] "DX LWOBS LAC HAND: 1YOM W FINGER INJ, CUT FINGER ON A FAN"   

Upvotes: 1

Views: 106

Answers (2)

Louis
Louis

Reputation: 3632

You can use stringr

You can first extract all the ages from your text and then compute further analysis. This code will do the trick (assuming your string vector is called str):

library(stringr)
ages <- str_extract_all(str, "(\\d{1,2}[MY]O[MF])", simplify = TRUE)

Use case:

library(stringr)

str <- c("DX LAC CHIN/ABRASION 12YOF CHEEK/CONTU HAND(S): 6YOF OUT RIDING BIKE, W WOBBLY ON BIKE AND HIT FACE ON ROAD, ABRASION TO L CHEEK, CHIN & R HAND",
         "DX KNEE SPRAIN/CONTU KNEE/HIGH BLD PRESS: 16YOM R KNEE PN AFTER TWISTING KNEE COMING DOWN F JUMP' DUR' BASKETBALL GAME, LANDED ON BENT KNEE", 
         "DX FX PHALANX FOOT: 36YOF STUBBED R GREAT TOE ON STAIRS, PN, SWELL'  SUROUNDING R GREAT TOE")

str <- paste(str, collapse = '')
ages <- str_extract_all(str, "(\\d{1,2}[MY]O[MF])", simplify = TRUE)

Output:

> ages
      [,1]    [,2]   [,3]    [,4]   
[1,] "12YOF" "6YOF" "16YOM" "36YOF"

Hope this helps.

Upvotes: 1

iod
iod

Reputation: 7592

agevector<-gsub(".* (\\d*[MY]O).*","\\1",vector)

This will create agevector which will be a character vector that includes things like 19MO and 5YO etc. It looks for the pattern "[any number of digits] followed by [M or Y] followed by O".

Upvotes: 2

Related Questions