user36729
user36729

Reputation: 575

Extracting words from multiple strings in r

I have the following string as an example in one column in my data frame:

 A = [{'name': 'Bank', 'id': 559}, {'name': 'Cinema', 'id': 2550}, {'name': 'Shopping', 'id': 10201}]

I have tried the following code to extract words ('Bank','Cinema','Shopping') from this string but is giving me 'character(0)':

 stringr::str_extract_all(A, "\\w+(?='\\})")

May I ask how can do this task?

Upvotes: 1

Views: 75

Answers (2)

Gopala
Gopala

Reputation: 10483

Since that is straight JSON, you can do something like this:

library(jsonlite)

A <- "[{'name': 'Bank', 'id': 559}, {'name': 'Cinema', 'id': 2550}, {'name': 'Shopping', 'id': 10201}]"
A <- gsub("'", '"', A) # fromJSON expects double quotes.

l <- fromJSON(A)
l$name

EDIT: Assuming you have a column with multiple JSON arrays like A and not just one JSON array as you showed above in your question, you will need to do something like this:

df <- data_frame(A = rep("[{'name': 'Bank', 'id': 559}, {'name': 'Cinema', 'id': 2550}, {'name': 'Shopping', 'id': 10201}]", 5))

df$A <- gsub("'", '"', df$A)
lapply(df$A, function(x) {j <- fromJSON(x); j$name})

I just repeated the same JSON array string you provided five times to create a 5-row data frame. Then, use lapply on each 'row' to get results from it.

Upvotes: 2

Jan
Jan

Reputation: 43199

Hackish (use a JSON approach!):

A <- c("[{'name': 'Bank', 'id': 559}, {'name': 'Cinema', 'id': 2550}, {'name': 'Shopping', 'id': 10201}]")

pattern <- "'name':\\s*['\"]\\K\\w+"
m <- gregexpr(pattern, A, perl = T)
(words <- unlist(regmatches(A, m)))

This will yield

[1] "Bank"     "Cinema"   "Shopping"

Upvotes: 0

Related Questions