Reputation: 1
I have the following data
Names[]
[1] John Simon is a great player
[2] Chi-Twi is from china
[3] O'Konnor works hard
[4] R.F is a swimmer
I need to extract only the names from all these rows and store them. I tried doing it this way.
[1] John Simon
[2] Chi-Twi
[3] O'Konnor
[4] R.F
names = gsub("(^[A-Z|a-z|.|-|']+[ ]+[A-Z|a-z|.|-|]+)[ ]+.*", "\\1",names)
can some one help me out?
Upvotes: 0
Views: 1693
Reputation: 81683
Based on @nhahtdh's comment, you can use
sub("(^\\w+\\W\\w+).*", "\\1", Names)
# [1] "John Simon" "Chi-Twi" "O'Konnor" "R.F"
Upvotes: 0
Reputation: 59273
Here's a regex that will work for this sample data:
names = gsub("(^[A-Za-z]+[^A-Za-z][A-Za-z]+)", "\\1", names)
If underscores are valid characters in a first or last name, you could shorten it to:
names = gsub("(^\\w+\\W\\w+)", "\\1", names)
It simply takes one or more letters, a non-letter, and then one or more letters again.
Some things I noticed wrong in your regex:
[A-Z|a-z|.|-|']+
actually matches A-Z
, |
, a-z
, |
(again), .
, |-|
(that's a range), and '
. You really wanted [A-Za-z.\\-']+
.
In any case, that's wrong, you don't want to include dots or dashes in the first name.
Upvotes: 1