user2968163
user2968163

Reputation: 55

delete parts of a character vector

I have a character vector which looks like this

"9/14/2007,,,,88.22"  "9/21/2007,,,,92.53"  "9/28/2007,,,,92" "10/5/2007,,,,92.85"

Now i need to remove all the elements before the 4 commas. So at the end it should look like this

"88.22"   "92.53"   "92"      "92.85"

I have tried the following code

gsub("[^0-9.]", "", x)

where x is my character vector but this keeps the integers before the commas (which are dates).

"914200788.22"   "921200792.53"   "928200792"      "105200792.85"

Also the number of elements to remove isnt always the same but the last one to remove is always the last comma. Maybe this will help for the solution.

Upvotes: 1

Views: 1160

Answers (3)

user1981275
user1981275

Reputation: 13372

Your regex just removes non-number characters. Try substituting everything before and including the four commas:

> vec = c("9/14/2007,,,,88.22",   "9/21/2007,,,,92.53",   "9/28/2007,,,,92",      "10/5/2007,,,,92.85")
> sub(".*,,,,", "", vec)
[1] "88.22" "92.53" "92"    "92.85"

Upvotes: 2

Rich Scriven
Rich Scriven

Reputation: 99331

Read the vector as a csv, then refer to the column. To get the last one without knowing how many original columns there are, we can reverse it and take the first.

rev(read.table(text = x, sep = ","))[[1]]
# [1] 88.22 92.53 92.00 92.85

Data:

x <- scan(text='"9/14/2007,,,,88.22"  "9/21/2007,,,,92.53"  "9/28/2007,,,,92" "10/5/2007,,,,92.85"', what="")

Upvotes: 0

acylam
acylam

Reputation: 18661

With stringr str_extract:

string = c("9/14/2007,,,,88.22",  "9/21/2007,,,,92.53",  "9/28/2007,,,,92", "10/5/2007,,,,92.85")

library(stringr)
str_extract(string, "\\d+[.]?\\d+$")

Or

str_extract(string, "(?<=,{4}).*")

Base R equivalent:

unlist(regmatches(string, gregexpr("\\d+[.]?\\d+$", string)))

unlist(regmatches(string, gregexpr("(?<=,{4}).*", string, perl = TRUE)))

sapply(str_split(string, ",,,,"), `[`, 2)

Notes:

  1. $ matches the end of string
  2. (?<=,{4}) is a positive lookbehind which checks whether .* is after 4 commas. This requires perl regex, which is why perl = TRUE is required for the second Base R example.

Upvotes: 0

Related Questions