Christopher B. L
Christopher B. L

Reputation: 245

How to use gsub to keep only the first characters/numbers in a vector in R?

Basically, I'd like to keep only the first character in a vector, I know this can be done in substr() easily, but I'd like to know how to do it in gsub().

For example,

codes <- c("02Q","4E (1)","4S (1)","A0","A2","A4")

I want a result vector like

c("0","4","4","A","A","A")

Thanks

Upvotes: 1

Views: 2571

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174796

Seems you did like to keep only the first character.

gsub("(?<!^).", "", codes, perl=TRUE)
# [1] "0" "4" "4" "A" "A" "A"

(?<!^) negative lookbehind which asserts that the match would be preceeded by any but not the start of a line boundary.

or

codes <- c("02Q","4E (1)","4S (1)","A0","A2","A4")
sub("(?<!^).*", "", codes, perl=T)
[1] "0" "4" "4" "A" "A" "A"

Few more..

> sub("(?!^.).*", "", codes, perl=T)
[1] "0" "4" "4" "A" "A" "A"
> sub("\\B.*", "", codes, perl=T)
[1] "0" "4" "4" "A" "A" "A"

Upvotes: 4

Cath
Cath

Reputation: 24074

you can do

sub("^(\\w).*$", "\\1", codes)
#[1] "0" "4" "4" "A" "A" "A"

Explanation:

  • ^: means the start of the string
  • \w: means an alphanumeric item, which you captures with the brackets, and then retrieve by putting "\\1" as the replacement parameter
  • .*: means anything, 0 or more times
  • $: means the end of the string

Upvotes: 6

Related Questions