Reputation: 21
I'm very new with coding, and I have to clean a table with string variables. One of the columns I'm trying to clean includes several variables in itself. So if I take one row from my column it looks like this
string<- ("'casual': True,'classy': False,'divey': False,'hipster': False,'intimate': False,'romantic': False,'touristy': False,'trendy': False,'upscale': False")
I'm trying to extract Boolean values for each of the categories into separate columns.So my outcome should have 9 columns(each for every category) and rows should include True/ False values.
What am I supposed to use in this case?
Upvotes: 2
Views: 42
Reputation: 887048
An option is to use str_extract_all
to extract the word (\\w+
) that succeeds a a space followed by a :
library(stringr)
as.logical(str_extract_all(string, "(?<=: )\\w+")[[1]])
#[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
If we need to parse into a data.frame, it would be better to use fromJSON
from jsonlite
library(jsonlite)
lst1 <- fromJSON(paste0("{", gsub("'", "", gsub("\\b(\\w+)\\b",
'"\\1"', string)), "}"))
data.frame(lapply(lst1, as.logical))
# casual classy divey hipster intimate romantic touristy trendy upscale
#1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Or in base R
as.logical(regmatches(string, gregexpr("(?<=: )\\w+", string, perl = TRUE))[[1]])
Upvotes: 1