Davide Lorino
Davide Lorino

Reputation: 915

Edit and Filter JSON List of Lists in R

I am trying to display this dataset -> https://mtgjson.com/json/AllSets.json.zip

However I would like to flatten the data so that it isn't nested as a bunch of JSON data within a list, within a list, within a list.

More specifically, I am trying to display the data as a dataframe, ranked in order of $releaseDate (one of the variables).

Here is my attempt so far:

library(jsonlite)
library(tidyjson)
mtgdata <- fromJSON("~/path/to/file.json")

The result of mtgdata presents this list of lists:

summary(mtgdata)
        Length Class  Mode
UST       9     -none- list
UNH      10     -none- list
UGL      11     -none- list
pWOS      8     -none- list
pWOR      8     -none- list
pWCQ      8     -none- list
pSUS      8     -none- list
pSUM     10     -none- list
pREL      8     -none- list
pPRO      8     -none- list
pPRE      8     -none- list
pPOD      7     -none- list
pMPR      8     -none- list
pMGD      8     -none- list
pMEI      8     -none- list
pLPA      8     -none- list
pLGM      8     -none- list
pJGP     10     -none- list
pHHO     11     -none- list
pWPN      8     -none- list
pGTW      8     -none- list
pGRU     10     -none- list
pGPX      8     -none- list
pFNM     10     -none- list
pELP      8     -none- list
pDRC      7     -none- list
pCMP      8     -none- list
pCEL      8     -none- list
pARL      8     -none- list
pALP     10     -none- list
p2HG      8     -none- list
p15A      8     -none- list
PD3       9     -none- list
PD2       9     -none- list
H09       9     -none- list
PTK      12     -none- list
POR      12     -none- list
PO2      13     -none- list
PCA       7     -none- list
PC2      10     -none- list
HOP      10     -none- list
VMA       9     -none- list
MMA      10     -none- list
MM3       8     -none- list
MM2      11     -none- list
MED       9     -none- list
ME4       9     -none- list
ME3       9     -none- list
ME2       9     -none- list
IMA       8     -none- list
EMA       9     -none- list
A25       8     -none- list
MPS_AKH   8     -none- list
MPS       9     -none- list
EXP       9     -none- list
E02       7     -none- list
V17       8     -none- list
V16       7     -none- list
V15       9     -none- list
V14       9     -none- list
V13       9     -none- list
V12      10     -none- list
V11      10     -none- list
V10       9     -none- list
V09      10     -none- list
DRB       9     -none- list
EVG       9     -none- list
DDT       7     -none- list
DDS       7     -none- list
DDR       7     -none- list
DDQ       8     -none- list
DDP      10     -none- list
DDO      10     -none- list
DDN      10     -none- list
DDM      10     -none- list
DDL      10     -none- list
DDK      10     -none- list
DDJ      10     -none- list
DDI      10     -none- list
DDH      10     -none- list
DDG      10     -none- list
DDF      10     -none- list
DDE      10     -none- list
DDD       9     -none- list
DDC       9     -none- list
DD3_JVC   9     -none- list
DD3_GVL   9     -none- list
DD3_EVG   9     -none- list
DD3_DVD   9     -none- list
DD2      11     -none- list
CNS      11     -none- list
CN2       9     -none- list
CMD      11     -none- list
CMA       7     -none- list
CM1      10     -none- list
C17       6     -none- list
C16       8     -none- list
C15      10     -none- list
C14      10     -none- list
C13      10     -none- list
CEI       9     -none- list
CED       9     -none- list
E01       7     -none- list
ARC       9     -none- list
ZEN      12     -none- list
XLN      12     -none- list
WWK      12     -none- list
WTH      13     -none- list
W17       8     -none- list
W16       8     -none- list
VIS      13     -none- list
VAN       8     -none- list
USG      13     -none- list
ULG      13     -none- list
UDS      13     -none- list
TSP      12     -none- list
TSB      12     -none- list
TPR      11     -none- list
TOR      12     -none- list
TMP      13     -none- list
THS      12     -none- list
STH      13     -none- list
SOM      12     -none- list
SOK      12     -none- list
SOI      10     -none- list
SHM      12     -none- list
SCG      12     -none- list
S99      11     -none- list
S00      11     -none- list
RTR      12     -none- list
RQS       6     -none- list
ROE      12     -none- list
RIX      12     -none- list
RAV      12     -none- list
PLS      13     -none- list
PLC      12     -none- list
PCY      13     -none- list
ORI      11     -none- list
ONS      12     -none- list
OGW      10     -none- list
ODY      13     -none- list
NPH      12     -none- list
NMS      14     -none- list
MRD      12     -none- list
MOR      12     -none- list
MMQ      13     -none- list
MIR      13     -none- list
MGB      10     -none- list
MD1       9     -none- list
MBS      12     -none- list
M15      11     -none- list
M14      11     -none- list
M13      11     -none- list
M12      11     -none- list
M11      11     -none- list
M10      11     -none- list
LRW      12     -none- list
LGN      12     -none- list
LEG      12     -none- list
LEB      11     -none- list
LEA      11     -none- list
KTK      12     -none- list
KLD       9     -none- list
JUD      12     -none- list
JOU      12     -none- list
ITP      11     -none- list
ISD      12     -none- list
INV      13     -none- list
ICE      13     -none- list
HOU       9     -none- list
HML      12     -none- list
GTC      12     -none- list
GPT      12     -none- list
FUT      12     -none- list
FRF_UGIN 10     -none- list
FRF      12     -none- list
FEM      11     -none- list
EXO      13     -none- list
EVE      12     -none- list
EMN       9     -none- list
DTK      12     -none- list
DST      12     -none- list
DRK      12     -none- list
DPA       9     -none- list
DKM       9     -none- list
DKA      12     -none- list
DIS      12     -none- list
DGM      12     -none- list
CST      11     -none- list
CSP      12     -none- list
CP3       7     -none- list
CP2       7     -none- list
CP1       7     -none- list
CON      13     -none- list
CHR      11     -none- list
CHK      12     -none- list
BTD      10     -none- list
BRB      10     -none- list
BOK      12     -none- list
BNG      12     -none- list
BFZ      12     -none- list
AVR      12     -none- list
ATQ      11     -none- list
ATH       9     -none- list
ARN      11     -none- list
ARB      12     -none- list
APC      13     -none- list
ALL      13     -none- list
ALA      12     -none- list
AKH       9     -none- list
AER       9     -none- list
9ED      12     -none- list
8ED      12     -none- list
7ED      12     -none- list
6ED      12     -none- list
5ED      12     -none- list
5DN      12     -none- list
4ED      12     -none- list
3ED      12     -none- list
2ED      11     -none- list
10E      11     -none- list

Within each of these lists are variables that I am interested in analysing to filter and sort this data as though it were a flattened dataframe.

When we inspect the list of variables within one of our lists (taking "mtgdata$UST" for example) we get this group of variables:

names(mtgdata$UST)
[1] "name"        "code"        "releaseDate" "border"      "type"        
"booster"     "mkm_name"   
[8] "mkm_id"      "cards"

Running the same query on another list within mtgdata ("mtgdata$SOI") we get another group of variables, though they are mostly the same.

As I have mentioned above, I am primarily interested in flattening this dataset and ranking by mtgdata$releaseDate - but as it stands, the "$releaseDate" is currently nested within the first group of lists ("$UST" etc.)

Your help with this or how I might better rephrase this question would be greatly appreciated.

Upvotes: 1

Views: 3245

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78832

You could try something like this on the command-line to convert the array of JSON objects into an file ndjson records then use something like ndjson::stream_in("filename_of the_thing_you_just_converted") but you'll end up with a 14,000+ column, quite useless, "flat" data frame.

Instead, do some spelunking:

library(tidyverse)

as1 <- jsonlite::read_json("~/Downloads/AllSets.json")

str(as1, 1) 
## List of 221
##  $ UST     :List of 9
##  $ UNH     :List of 10
##  $ UGL     :List of 11
##  $ pWOS    :List of 8
##  $ pWOR    :List of 8
##  $ pWCQ    :List of 8
##  $ pSUS    :List of 8
##  $ pSUM    :List of 10
##  $ pREL    :List of 8
##  $ pPRO    :List of 8
##  $ pPRE    :List of 8
##  $ pPOD    :List of 7
##  $ pMPR    :List of 8
##  $ pMGD    :List of 8
##  $ pMEI    :List of 8
##  $ pLPA    :List of 8
##  $ pLGM    :List of 8
##  $ pJGP    :List of 10
##  $ pHHO    :List of 11
## ...

Ugh…one of "those" JSON files that doesn't see fit to have all the elements of each record populated even when the entire file is — in theory — supposed to be consistent.

Let's see what JSON array elements have the most number of fields filled in since that means those likely have them all populated:

map_dbl(as1, length) %>% 
  broom::tidy() %>% 
  arrange(desc(x))
## # A tibble: 221 x 2
##    names     x
##    <chr> <dbl>
##  1 NMS    14.0
##  2 PO2    13.0
##  3 WTH    13.0
##  4 VIS    13.0
##  5 USG    13.0
##  6 ULG    13.0
##  7 UDS    13.0
##  8 TMP    13.0
##  9 STH    13.0
## 10 PLS    13.0
## # ... with 211 more rows

Let's take a look at NMS:

str(as1[["NMS"]], 1)
## List of 14
##  $ name              : chr "Nemesis"
##  $ code              : chr "NMS"
##  $ gathererCode      : chr "NE"
##  $ magicCardsInfoCode: chr "ne"
##  $ oldCode           : chr "NEM"
##  $ releaseDate       : chr "2000-02-14"
##  $ border            : chr "black"
##  $ type              : chr "expansion"
##  $ block             : chr "Masques"
##  $ booster           :List of 15
##  $ translations      :List of 5
##  $ mkm_name          : chr "Nemesis"
##  $ mkm_id            : int 32
##  $ cards             :List of 143

You really don't want to flatten booster, translations or cards and should keep them as list columns and unnest as necessary.

But, since each records has varying fields, we can't simply "data.table::rbindlist()ordplyr::bind_rows()` since it's going to complain about a few of those columns.

We'll have to go record-by-record and convert each to a data frame, handling missing fields and wrapping the list ones in a list(). We'll simplify the functional idiom with a helper function to test for missing values:

`%l0%` <- function(x, y) if (length(x) > 0) x else y

^^ is a bit more robust than %||% that comes along for the ride with purrr.

Finally:

map_df(as1, ~{
  data_frame(
    name = .x$name %l0% NA_character_,
    code = .x$code,
    gathererCode = .x$gathererCode %l0% NA_character_,
    magicCardsInfoCode = .x$magicCardsInfoCode %l0% NA_character_,
    oldCode = .x$oldCode %l0% NA_character_,
    releaseDate = .x$releaseDate %l0% NA_character_,
    border = .x$border,
    type = .x$type,
    block = .x$block %l0% NA_character_,
    booster = list(.x$booster),
    translations = list(.x$translations),
    mkm_name = .x$mkm_name %l0% NA_character_,
    mkm_id = .x$mkm_id %l0% NA_character_,
    cards = list(.x$cards)
  )
}) -> all_sets

And, you can see the result:

all_sets
## # A tibble: 221 x 14
##    name           code  gathererCode magicCardsInfoC… oldCode releaseDate border type  block booster 
##    <chr>          <chr> <chr>        <chr>            <chr>   <chr>       <chr>  <chr> <chr> <list>  
##  1 Unstable       UST   NA           NA               NA      2017-12-08  silver un    NA    <list […
##  2 Unhinged       UNH   NA           uh               NA      2004-11-20  silver un    NA    <list […
##  3 Unglued        UGL   UG           ug               NA      1998-08-11  silver un    NA    <list […
##  4 Wizards of th… pWOS  NA           wotc             NA      1999-09-04  black  promo NA    <NULL>  
##  5 Worlds         pWOR  NA           wrl              NA      1999-08-04  black  promo NA    <NULL>  
##  6 World Magic C… pWCQ  NA           wmcq             NA      2013-04-06  black  promo NA    <NULL>  
##  7 Super Series   pSUS  NA           sus              NA      1999-12-01  black  promo NA    <NULL>  
##  8 Summer of Mag… pSUM  NA           sum              NA      2007-07-21  black  promo NA    <NULL>  
##  9 Release Events pREL  NA           rep              NA      2003-07-26  black  promo NA    <NULL>  
## 10 Pro Tour       pPRO  NA           pro              NA      2007-02-09  black  promo NA    <NULL>  
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>

glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name               <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code               <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode       <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate        <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border             <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type               <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster            <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations       <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name           <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id             <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards              <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...

And, we can arrange them by releaseDate after converting the column to a proper Date object:

mutate(all_sets, releaseDate = lubridate::ymd(releaseDate)) %>% 
  arrange(desc(releaseDate))
## # A tibble: 221 x 14
##    name        code  gathererCode magicCardsInfoCo… oldCode releaseDate border type     block booster
##    <chr>       <chr> <chr>        <chr>             <chr>   <date>      <chr>  <chr>    <chr> <list> 
##  1 Masters 25  A25   NA           a25               NA      2018-03-16  black  reprint  NA    <NULL> 
##  2 Rivals of … RIX   NA           rix               NA      2018-01-19  black  expansi… Ixal… <list …
##  3 Unstable    UST   NA           NA                NA      2017-12-08  silver un       NA    <list …
##  4 Explorers … E02   NA           e02               NA      2017-11-24  black  board g… NA    <NULL> 
##  5 From the V… V17   NA           v17               NA      2017-11-24  black  from th… NA    <NULL> 
##  6 Iconic Mas… IMA   NA           ima               NA      2017-11-17  black  reprint  NA    <list …
##  7 Duel Decks… DDT   NA           ddt               NA      2017-11-10  black  duel de… NA    <NULL> 
##  8 Ixalan      XLN   NA           xln               NA      2017-09-29  black  expansi… Ixal… <list …
##  9 Commander … C17   NA           NA                NA      2017-08-25  black  command… NA    <NULL> 
## 10 Hour of De… HOU   NA           hou               NA      2017-07-14  black  expansi… Amon… <list …
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>

Upvotes: 5

Related Questions