Hani Ihlayyle
Hani Ihlayyle

Reputation: 135

extract information from string using regex in R

I have data like this i want to extract some information from x and y

x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y= {"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}

The result

device_codename   brand     percent_incoming_nighttime percent_outgoing_daytime
nikel             Xiaomi    0.88                       9.29

I have tired using grep but iam getting errors any suggestion?

grep("device_codename", x, perl=TRUE, value=TRUE)

Upvotes: 0

Views: 51

Answers (3)

akrun
akrun

Reputation: 887118

After removing the braces ({}) and double quotes with gsub, read the substring after the : using read.csv into a data.frame and then change the column names with the substring i.e. before the :

v1 <- gsub('"|[{}]', "", c(x, y))
out <- read.csv(text=paste(gsub("\\w+:\\s+", "", v1), collapse=", "),
       header=FALSE, stringsAsFactors = FALSE)
colnames(out) <- unlist(regmatches(v1, gregexpr("\\w+(?=:)", v1, perl = TRUE)))


out
#  device_codename   brand percent_incoming_nighttime percent_outgoing_daytime
#1           nikel  Xiaomi                       0.88                     9.29

NOTE: No external packages used


Or using RJSONIO and tidyverse

library(tidyverse)
library(RJSONIO)
list(x, y) %>%
    map(~ fromJSON(.x) %>% 
            as.list %>%
            as_tibble) %>%
       bind_cols
# A tibble: 1 x 4
#  device_codename brand  percent_incoming_nighttime percent_outgoing_daytime
#  <chr>           <chr>                       <dbl>                    <dbl>
#1 nikel           Xiaomi                       0.88                     9.29

data

x <- "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}"
y <- "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

Upvotes: 0

Selcuk Akbas
Selcuk Akbas

Reputation: 711

completed jsonlite solution (Roman Luštrik)

library(jsonlite)
library(dplyr)

xx_x= "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
xx_y= "{\"percent_incoming_nighttime\": 0.88, \"percent_outgoing_daytime\": 9.29}"

c(jsonlite::fromJSON(xx_x), jsonlite::fromJSON(xx_y)) %>% 
  reshape2::melt() %>% mutate(myrow = 1) %>% 
  spread(L1, value)

result

  myrow  brand device_codename percent_incoming_nighttime percent_outgoing_daytime
1     1 Xiaomi           nikel                       0.88                     9.29

Upvotes: 0

Roman Luštrik
Roman Luštrik

Reputation: 70643

This is possibly JSON format. There are tools to handle those.

library(jsonlite)

x = "{\"device_codename\": \"nikel\", \"brand\": \"Xiaomi\"}" 
y = '{"percent_incoming_nighttime": 0.88, "percent_outgoing_daytime": 9.29}'

> unlist(fromJSON(x))
device_codename           brand 
        "nikel"        "Xiaomi" 
> unlist(fromJSON(y))
percent_incoming_nighttime   percent_outgoing_daytime 
                      0.88                       9.29

Upvotes: 3

Related Questions