user2855907
user2855907

Reputation: 311

Get the strings before the comma with R

I am a beginner with R. Now, I have a vector in a data.frame like this

city
Kirkland,
Bethesda,
Wellington,
La Jolla,
Berkeley,
Costa, Evie KW172NJ
Miami,
Plano,
Sacramento,
Middletown,
Webster,
Houston,
Denver,
Kirkland,
Pinecrest,
Tarzana,
Boulder,
Westfield,
Fair Haven,
Royal Palm Beach, Fl
Westport,
Encino,
Oak Ridge,

I want to clean it. What I want is all the city names before the comma. How can I get the result in R? Thanks!

Upvotes: 18

Views: 41527

Answers (5)

Jeereddy
Jeereddy

Reputation: 1080

If the this was a column in a dataframe, we can use tidyverse.

library(dplyr)
x <- c("London, UK", "Paris, France", "New York, USA")
x <- as.data.frame(x)
x %>% separate(x, c("A","B"), sep = ',')
        A       B
1   London      UK
2    Paris  France
3 New York     USA

Upvotes: 4

Tyler Rinker
Tyler Rinker

Reputation: 109844

This works as well:

x <- c("London, UK", "Paris, France", "New York, USA")

library(qdap)
beg2char(x, ",")

## > beg2char(x, ",")
## [1] "London"   "Paris"    "New York"

Upvotes: 2

juba
juba

Reputation: 49033

You can use gsub with a bit of regexp :

cities <- gsub("^(.*?),.*", "\\1", df$city)

This one works, too :

cities <- gsub(",.*$", "", df$city)

Upvotes: 26

Jilber Urbina
Jilber Urbina

Reputation: 61154

Just for fun, you can use strsplit

> x <- c("London, UK", "Paris, France", "New York, USA")
> sapply(strsplit(x, ","), "[", 1)
[1] "London"   "Paris"    "New York"

Upvotes: 7

James
James

Reputation: 66834

You could use regexpr to find the position of the first comma in each element and use substr to snip them at this:

x <- c("London, UK", "Paris, France", "New York, USA")

substr(x,1,regexpr(",",x)-1)
[1] "London"   "Paris"    "New York"

Upvotes: 4

Related Questions