Reputation: 575

Extracting words from a string in r

I have the following text as an example in every row of my data frame, df:

[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]

Is there any way to extract words (Soccer, Basketball, Boxing) from this text? Sorry I am new to text analysis in R.

Upvotes: 2

Answers (2)

Maurits Evers

Reputation: 50678

It looks like you have a JSON input string. You can parse the JSON string with jsonlite::fromJSON, and extract the relevant column name:

# Sample string
ss <- "[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]";

# Parse JSON
library(jsonlite);
df <- fromJSON(txt = gsub("'", "\"", ss));

# Extract words
df$name;
#[1] "Soccer"     "Basketball" "Boxing"

Upvotes: 3

Rui Barradas

Reputation: 76450

Maybe something like the following.

x <- "[{'id': 16, 'name': 'Soccer'}, {'id': 35, 'name': 'Basketball'}, {'id': 10751, 'name': 'Boxing'}]"
g <- gregexpr("[[:alpha:]]+", x)
y <- unlist(regmatches(x, g))
y[y != "id" & y != "name"]
#[1] "Soccer"     "Basketball" "Boxing"

Another possibility for this last instruction would be to use %in%.

y[!y %in% c("id", "name")]
#[1] "Soccer"     "Basketball" "Boxing"

Like this you could have a vector of unwanted strings, such as c("id", "name"), and avoid a long conjunction &.

Upvotes: 0

Extracting words from a string in r

Answers (2)

Related Questions