Guy
Guy

Reputation: 1

R gsub( ) , Regular Expression

I have the following data

Names[]
[1] John Simon is a great player
[2] Chi-Twi is from china
[3] O'Konnor works hard
[4] R.F is a swimmer

I need to extract only the names from all these rows and store them. I tried doing it this way.

[1] John Simon 
[2] Chi-Twi 
[3] O'Konnor 
[4] R.F 

names = gsub("(^[A-Z|a-z|.|-|']+[ ]+[A-Z|a-z|.|-|]+)[ ]+.*", "\\1",names)

can some one help me out?

Upvotes: 0

Views: 1693

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

Based on @nhahtdh's comment, you can use

sub("(^\\w+\\W\\w+).*", "\\1", Names)
# [1] "John Simon" "Chi-Twi"    "O'Konnor"   "R.F"       

Upvotes: 0

tckmn
tckmn

Reputation: 59273

Here's a regex that will work for this sample data:

names = gsub("(^[A-Za-z]+[^A-Za-z][A-Za-z]+)", "\\1", names)

If underscores are valid characters in a first or last name, you could shorten it to:

names = gsub("(^\\w+\\W\\w+)", "\\1", names)

It simply takes one or more letters, a non-letter, and then one or more letters again.

Some things I noticed wrong in your regex:

[A-Z|a-z|.|-|']+ actually matches A-Z, |, a-z, | (again), ., |-| (that's a range), and '. You really wanted [A-Za-z.\\-']+.

In any case, that's wrong, you don't want to include dots or dashes in the first name.

Upvotes: 1

Related Questions