Brian McNamara
Brian McNamara

Reputation: 23

R - Remove everything after the second space in a data frame column

I have a column in a data frame where each record is a list of names.

e.g. John Smith, Jane Smith, Joe Smith, Judy Smith, etc...

I want to delete everything except the first name for the entire column, basically from the first comma onwards, so my column will just have one name.

e.g. John Smith

I've tried playing around with sub, gsub, regex, but I am lost. I just started using R about two days ago and was doing fine until I hit this roadblock.

Any help appreciated.

Upvotes: 1

Views: 2961

Answers (3)

Phillip Perin
Phillip Perin

Reputation: 61

stringr answer although it is the same

pattern <- data.frame("colid" = c(1, 2), 
                      "text" = c("john smith, jane smith", "jon stewart, steven colbert"))
pattern %>% 
  mutate(text2 = str_replace_all(text, ",.*", ""))

Upvotes: 0

R. Schifini
R. Schifini

Reputation: 9313

If your data frame is like this:

df = data.frame(names = c("John Smith, Jane Smith, Joe Smith, Judy Smith","Jane Smith, Joe Smith, Judy Smith","Joe Smith, Judy Smith","Judy Smith"))

> df
                                          names
1 John Smith, Jane Smith, Joe Smith, Judy Smith
2             Jane Smith, Joe Smith, Judy Smith
3                         Joe Smith, Judy Smith
4                                    Judy Smith

Then do:

df$first = sub(",.*","",df$names)

Result:

> df
                                          names      first
1 John Smith, Jane Smith, Joe Smith, Judy Smith John Smith
2             Jane Smith, Joe Smith, Judy Smith Jane Smith
3                         Joe Smith, Judy Smith  Joe Smith
4                                    Judy Smith Judy Smith

Upvotes: 0

neilfws
neilfws

Reputation: 33822

Assuming your names are in a column called Name in data frame mydata, try this first. It says "replace a comma followed by anything to the end of the line with an empty string".

sub(",.+", "", mydata$Name)

If it looks like that worked, assign the result to the column:

mydata$Name <- sub(",.+", "", mydata$Name)

Upvotes: 2

Related Questions