Reputation: 1
I have one data frame which has category one of the column. Data in category column mentioned below :
Application Platforms|Real Time|Social Network Media
Apps|Games|Mobile
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment|Games|Software
Networking|Real Estate|Web Hosting
The category list is a list of multiple sub-sectors separated by a pipe (vertical bar |). I want to extract the primary sector which is the first string before the vertical bar("|").
That means I want the output should be,
Application Platforms
Apps
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment
Networking
Please help me how can I do this through using any function, I have tried using stringr package functions.
Upvotes: 0
Views: 102
Reputation: 522762
We can use sub
here:
df$category <- sub("^([^|]+).*", "\\1", df$category)
Here is another variation which doesn't use a capture group:
df$category <- sub("\\|.*", "", df$category)
Upvotes: 2
Reputation:
Using strsplit
:
category1 <- strsplit(df$category, "|", fixed = TRUE)
df$category <- sapply(category1, `[[`, 1) # or, purrr::map_chr(category1, 1)
This solution makes your intention a bit clearer than using sub
, I think. Then again, it requires an extra line.
Upvotes: 2
Reputation: 18435
Or using stringr
...
str_match("Application Platforms|Real Time|Social Network Media",
"^(.+?)[|$]")[,2] #match start of string up to first | or end or string
[1] "Application Platforms"
or...
str_replace("Application Platforms|Real Time|Social Network Media",
"\\|.+$","") #replace | and any subsequent characters with ""
[1] "Application Platforms"
or...
str_extract("Application Platforms|Real Time|Social Network Media",
"[^|]+") #extract first sequence of characters that are not a |
[1] "Application Platforms"
or...
str_split_fixed("Application Platforms|Real Time|Social Network Media",
"\\|",2)[,1] #split at first | and take the first section
[1] "Application Platforms"
Upvotes: 1