markwk
markwk

Reputation: 135

Importing JSON File Data into R as Data Frame for NLP

I'm trying to import data from a JSON file into R in order to experiment with natural language processing. The data was parsed and extracted from a blog written in markdown. The problem is that the import in R is imported as lists and a funny format, and I can't figure out how to get it into a data frame. Is it a problem with my JSON file or import process?

Sample Data:

{
  "2017-11-17-blog-post-01": {
    "title": "Blog Post 01",
    "layout": "post",
    "categories": [
      "Category1",
      "Category2"
    ],
    "comments": true,
    "published": true,
    "permalink": "/blog-post-01.html",
    "basename": "2017-11-17-blog-post-01"
  },
  "2017-11-30-blog-post-02": {
    "title": "Blog Post 2",
    "layout": "post",
    "categories": [
      "Category2",
      "Category3"
    ],
    "comments": true,
    "published": true,
    "permalink": "/2017-11-30-blog-post-02.html",
    "basename": "2017-11-30-blog-post-02"
  }
}

Command:

library(jsonlite)
import <- fromJSON("test-import.json", flatten=TRUE)

Results:

$`2017-11-17-blog-post-01`
$`2017-11-17-blog-post-01`$title
[1] "Blog Post 01"

$`2017-11-17-blog-post-01`$layout
[1] "post"

$`2017-11-17-blog-post-01`$categories
[1] "Category1" "Category2"

$`2017-11-17-blog-post-01`$comments
[1] TRUE

$`2017-11-17-blog-post-01`$published
[1] TRUE

$`2017-11-17-blog-post-01`$permalink
[1] "/blog-post-01.html"

$`2017-11-17-blog-post-01`$basename
[1] "2017-11-17-blog-post-01"


$`2017-11-30-blog-post-02`
$`2017-11-30-blog-post-02`$title
[1] "Blog Post 2"

$`2017-11-30-blog-post-02`$layout
[1] "post"

$`2017-11-30-blog-post-02`$categories
[1] "Category2" "Category3"

$`2017-11-30-blog-post-02`$comments
[1] TRUE

$`2017-11-30-blog-post-02`$published
[1] TRUE

$`2017-11-30-blog-post-02`$permalink
[1] "/2017-11-30-blog-post-02.html"

$`2017-11-30-blog-post-02`$basename
[1] "2017-11-30-blog-post-02"

Upvotes: 0

Views: 215

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

library(purrr)

Your data:

jsonlite::fromJSON('{
  "2017-11-17-blog-post-01": {
    "title": "Blog Post 01",
    "layout": "post",
    "categories": [
      "Category1",
      "Category2"
    ],
    "comments": true,
    "published": true,
    "permalink": "/blog-post-01.html",
    "basename": "2017-11-17-blog-post-01"
  },
  "2017-11-30-blog-post-02": {
    "title": "Blog Post 2",
    "layout": "post",
    "categories": [
      "Category2",
      "Category3"
    ],
    "comments": true,
    "published": true,
    "permalink": "/2017-11-30-blog-post-02.html",
    "basename": "2017-11-30-blog-post-02"
  }
}', flatten=TRUE) -> jsdat

flatten=TRUE works much of the time but I think categories is causing it to not automagically make a data frame for you, so we can give it a hand:

map_df(jsdat, ~{
  .x$categories <- list(.x$categories)
  .x
}, .id="id")

## # A tibble: 2 x 8
##                        id        title layout categories comments published                     permalink                basename
##                     <chr>        <chr>  <chr>     <list>    <lgl>     <lgl>                         <chr>                   <chr>
## 1 2017-11-17-blog-post-01 Blog Post 01   post  <chr [2]>     TRUE      TRUE            /blog-post-01.html 2017-11-17-blog-post-01
## 2 2017-11-30-blog-post-02  Blog Post 2   post  <chr [2]>     TRUE      TRUE /2017-11-30-blog-post-02.html 2017-11-30-blog-post-02

Upvotes: 1

Related Questions