Reputation: 407
I am working on a movie dataset that has genres in the following format: "Animation|Sci-Fi", "Adventure|Animation|Children|Fantasy", etc.
I want to separate them into individual words, like "Animation" and "Sci-Fi"
I've tried using str_split
in the stringr
package, but it's not giving me what I want. I'm sure I'm using the wrong code. Could somebody give me some advice on how to proceed? Thanks.
Edit: I believe I'm supposed to give str_split
a regular expression pattern, so I tried str_extract(test_df$genres[1:20], "\\w+|\\w+")
for a test run, but I was not able to get what I need.
Upvotes: 1
Views: 53
Reputation: 50738
s <- "Animation|Sci-Fi|Adventure|Animation|Children|Fantasy";
# In base R
unlist(strsplit(s, "\\|"));
#[1] "Animation" "Sci-Fi" "Adventure" "Animation" "Children" "Fantasy"
# Using stringr
require(stringr);
unlist(str_split(s, "\\|"));
#[1] "Animation" "Sci-Fi" "Adventure" "Animation" "Children" "Fantasy"
Upvotes: 3