Suraj Kumar Talreja
Suraj Kumar Talreja

Reputation: 1

Extract the data from string through R

I have one data frame which has category one of the column. Data in category column mentioned below :

Application Platforms|Real Time|Social Network Media
Apps|Games|Mobile
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment|Games|Software
Networking|Real Estate|Web Hosting

The category list is a list of multiple sub-sectors separated by a pipe (vertical bar |). I want to extract the primary sector which is the first string before the vertical bar("|").

That means I want the output should be,

Application Platforms
Apps
Curated Web
Software
Games
Biotechnology
Analytics
Mobile
E-Commerce
Entertainment
Networking

Please help me how can I do this through using any function, I have tried using stringr package functions.

Upvotes: 0

Views: 102

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

We can use sub here:

df$category <- sub("^([^|]+).*", "\\1", df$category)

Here is another variation which doesn't use a capture group:

df$category <- sub("\\|.*", "", df$category)

Demo

Upvotes: 2

user3603486
user3603486

Reputation:

Using strsplit:

category1 <- strsplit(df$category, "|", fixed = TRUE)
df$category <- sapply(category1, `[[`, 1)     # or, purrr::map_chr(category1, 1)

This solution makes your intention a bit clearer than using sub, I think. Then again, it requires an extra line.

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18435

Or using stringr...

str_match("Application Platforms|Real Time|Social Network Media",
       "^(.+?)[|$]")[,2] #match start of string up to first | or end or string

[1] "Application Platforms"

or...

str_replace("Application Platforms|Real Time|Social Network Media",
       "\\|.+$","") #replace | and any subsequent characters with ""

[1] "Application Platforms"

or...

str_extract("Application Platforms|Real Time|Social Network Media",
       "[^|]+") #extract first sequence of characters that are not a |

[1] "Application Platforms"

or...

str_split_fixed("Application Platforms|Real Time|Social Network Media",
       "\\|",2)[,1] #split at first | and take the first section

[1] "Application Platforms"

Upvotes: 1

Related Questions