Reputation: 55
I have a character vector which looks like this
"9/14/2007,,,,88.22" "9/21/2007,,,,92.53" "9/28/2007,,,,92" "10/5/2007,,,,92.85"
Now i need to remove all the elements before the 4 commas. So at the end it should look like this
"88.22" "92.53" "92" "92.85"
I have tried the following code
gsub("[^0-9.]", "", x)
where x is my character vector but this keeps the integers before the commas (which are dates).
"914200788.22" "921200792.53" "928200792" "105200792.85"
Also the number of elements to remove isnt always the same but the last one to remove is always the last comma. Maybe this will help for the solution.
Upvotes: 1
Views: 1160
Reputation: 13372
Your regex just removes non-number characters. Try substituting everything before and including the four commas:
> vec = c("9/14/2007,,,,88.22", "9/21/2007,,,,92.53", "9/28/2007,,,,92", "10/5/2007,,,,92.85")
> sub(".*,,,,", "", vec)
[1] "88.22" "92.53" "92" "92.85"
Upvotes: 2
Reputation: 99331
Read the vector as a csv, then refer to the column. To get the last one without knowing how many original columns there are, we can reverse it and take the first.
rev(read.table(text = x, sep = ","))[[1]]
# [1] 88.22 92.53 92.00 92.85
Data:
x <- scan(text='"9/14/2007,,,,88.22" "9/21/2007,,,,92.53" "9/28/2007,,,,92" "10/5/2007,,,,92.85"', what="")
Upvotes: 0
Reputation: 18661
With stringr
str_extract
:
string = c("9/14/2007,,,,88.22", "9/21/2007,,,,92.53", "9/28/2007,,,,92", "10/5/2007,,,,92.85")
library(stringr)
str_extract(string, "\\d+[.]?\\d+$")
Or
str_extract(string, "(?<=,{4}).*")
Base R equivalent:
unlist(regmatches(string, gregexpr("\\d+[.]?\\d+$", string)))
unlist(regmatches(string, gregexpr("(?<=,{4}).*", string, perl = TRUE)))
sapply(str_split(string, ",,,,"), `[`, 2)
Notes:
$
matches the end of string(?<=,{4})
is a positive lookbehind which checks whether .*
is after 4 commas. This requires perl regex, which is why perl = TRUE
is required for the second Base R example.Upvotes: 0