Reputation: 575
I have the following text as an example in every row of my data frame, df:
[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]
Is there any way to extract words (Soccer, Basketball, Boxing) from this text? Sorry I am new to text analysis in R.
Upvotes: 2
Views: 101
Reputation: 50678
It looks like you have a JSON input string. You can parse the JSON string with jsonlite::fromJSON
, and extract the relevant column name
:
# Sample string
ss <- "[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]";
# Parse JSON
library(jsonlite);
df <- fromJSON(txt = gsub("'", "\"", ss));
# Extract words
df$name;
#[1] "Soccer" "Basketball" "Boxing"
Upvotes: 3
Reputation: 76450
Maybe something like the following.
x <- "[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]"
g <- gregexpr("[[:alpha:]]+", x)
y <- unlist(regmatches(x, g))
y[y != "id" & y != "name"]
#[1] "Soccer" "Basketball" "Boxing"
Another possibility for this last instruction would be to use %in%
.
y[!y %in% c("id", "name")]
#[1] "Soccer" "Basketball" "Boxing"
Like this you could have a vector of unwanted strings, such as c("id", "name")
, and avoid a long conjunction &
.
Upvotes: 0