user122514
user122514

Reputation: 407

Separating a string in R into words

I am working on a movie dataset that has genres in the following format: "Animation|Sci-Fi", "Adventure|Animation|Children|Fantasy", etc.

I want to separate them into individual words, like "Animation" and "Sci-Fi"

I've tried using str_split in the stringr package, but it's not giving me what I want. I'm sure I'm using the wrong code. Could somebody give me some advice on how to proceed? Thanks.

Edit: I believe I'm supposed to give str_split a regular expression pattern, so I tried str_extract(test_df$genres[1:20], "\\w+|\\w+") for a test run, but I was not able to get what I need.

Upvotes: 1

Views: 53

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50738

s <- "Animation|Sci-Fi|Adventure|Animation|Children|Fantasy";

# In base R
unlist(strsplit(s, "\\|"));
#[1] "Animation" "Sci-Fi"    "Adventure" "Animation" "Children"  "Fantasy"

# Using stringr
require(stringr);
unlist(str_split(s, "\\|"));
#[1] "Animation" "Sci-Fi"    "Adventure" "Animation" "Children"  "Fantasy"

Upvotes: 3

Related Questions