Reputation: 585
I am extracting multiple types of pattern from a string. For example,
"Listed 03/25/2013 for 25000 and sold for $10,250 on 4/5/2010"
I would like to extract dates "03/25/2013" "4/5/2010" to vector 'dates', and "25000" "$10,250" to vector amounts.
text <- "Listed 03/25/2013 for 25000 and sold for $10,250 on 4/5/2010"
# extract dates
dates <- str_extract_all(text,"\\d{1,2}\\/\\d{1,2}\\/\\d{4}")[[1]]
# extract amounts
text2 <- as.character(gsub("\\d{1,2}\\/\\d{1,2}\\/\\d{4}", " ", text))
amountsdollar <- as.character(str_extract_all(text2,"\\$\\(?[0-9,.]+\\)?"))
text3 <- as.character(gsub("\\$\\(?[0-9,.]+\\)?", " ", text2))
amountsnum <- as.character(str_extract_all(text3,"\\(?[0-9,.]+\\)?"))
amounts <- as.vector(c(amountsdollar, amountsnum))
list(dates, amounts)
But the order is not kept. Is there a better way to do it? Thanks.
Upvotes: 0
Views: 645
Reputation: 44614
base R handles this fine
x <- "Listed 03/25/2013 for 25000 and sold for $10,250, on 4/5/2010"
date.pat <- '\\d{1,2}/\\d{1,2}/\\d{2,4}'
amount.pat <- '(?<=^| )[$,0-9]+[0-9](?=,|\\.|$| )'
dates <- regmatches(x, gregexpr(date.pat, x))
amounts <- regmatches(x, gregexpr(amount.pat, x, perl=TRUE))
Upvotes: 6