Reputation: 23
I have a column in a data frame where each record is a list of names.
e.g. John Smith, Jane Smith, Joe Smith, Judy Smith, etc...
I want to delete everything except the first name for the entire column, basically from the first comma onwards, so my column will just have one name.
e.g. John Smith
I've tried playing around with sub, gsub, regex, but I am lost. I just started using R about two days ago and was doing fine until I hit this roadblock.
Any help appreciated.
Upvotes: 1
Views: 2961
Reputation: 61
stringr answer although it is the same
pattern <- data.frame("colid" = c(1, 2),
"text" = c("john smith, jane smith", "jon stewart, steven colbert"))
pattern %>%
mutate(text2 = str_replace_all(text, ",.*", ""))
Upvotes: 0
Reputation: 9313
If your data frame is like this:
df = data.frame(names = c("John Smith, Jane Smith, Joe Smith, Judy Smith","Jane Smith, Joe Smith, Judy Smith","Joe Smith, Judy Smith","Judy Smith"))
> df
names
1 John Smith, Jane Smith, Joe Smith, Judy Smith
2 Jane Smith, Joe Smith, Judy Smith
3 Joe Smith, Judy Smith
4 Judy Smith
Then do:
df$first = sub(",.*","",df$names)
Result:
> df
names first
1 John Smith, Jane Smith, Joe Smith, Judy Smith John Smith
2 Jane Smith, Joe Smith, Judy Smith Jane Smith
3 Joe Smith, Judy Smith Joe Smith
4 Judy Smith Judy Smith
Upvotes: 0
Reputation: 33822
Assuming your names are in a column called Name
in data frame mydata
, try this first. It says "replace a comma followed by anything to the end of the line with an empty string".
sub(",.+", "", mydata$Name)
If it looks like that worked, assign the result to the column:
mydata$Name <- sub(",.+", "", mydata$Name)
Upvotes: 2