Yura
Yura

Reputation: 1174

Go reading map from json stream

I need to parse really long json file (more than million items). I don't want to load it to the memory and read it chunk by chunk. There's a good example with the array of items here. The problem is that I deal with the map. And when I call Decode I get not at beginning of value. I can't get what should be changed.


const data = `{
  "object1": {"name": "cattle","location": "kitchen"},
  "object2": {"name": "table","location": "office"}
}`

type ReadObject struct {
    Name     string `json:"name"`
    Location string `json:"location"`
}

func ParseJSON() {

    dec := json.NewDecoder(strings.NewReader(data))

    tkn, err := dec.Token()
    if err != nil {
        log.Fatalf("failed to read opening token: %v", err)
    }
    fmt.Printf("opening token: %v\n", tkn)

    objects := make(map[string]*ReadObject)

    for dec.More() {

        var nextSymbol string
        if err := dec.Decode(&nextSymbol); err != nil {
            log.Fatalf("failed to parse next symbol: %v", err)
        }

        nextObject := &ReadObject{}

        if err := dec.Decode(&nextObject); err != nil {
            log.Fatalf("failed to parse next object")
        }

        objects[nextSymbol] = nextObject

    }

    tkn, err =  dec.Token()
    if err != nil {
        log.Fatalf("failed to read closing token: %v", err)
    }
    fmt.Printf("closing token: %v\n", tkn)

    fmt.Printf("OBJECTS: \n%v\n", objects)
}

Upvotes: 1

Views: 414

Answers (2)

LeGEC
LeGEC

Reputation: 51790

After consuming the initial { with your first call to dec.Token(), you must :

  • use dec.Token() to extract the next key
  • after extracting the key, you can call dec.Decode(&nextObject) to decode an entry

example code :

    for dec.More() {
        key, err := dec.Token()
        if err != nil {
            // handle error
        }

        var val interface{}
        err = dec.Decode(&val)
        if err != nil {
            // handle error
        }

        fmt.Printf("  %s : %v\n", key, val)
    }

https://play.golang.org/p/5r1d8MsNlKb

Upvotes: 1

Sergei Karpov
Sergei Karpov

Reputation: 126

TL,DR: when you are calling Token() method for a first time, you move offset from the beginning (of a JSON value) and therefore you get the error.

You are working with this struct (link):

type Decoder struct {
    // others fields omits for simplicity
    tokenState int
}

Pay attention for a tokenState field. This value could be one of (link):

const (
    tokenTopValue = iota
    tokenArrayStart
    tokenArrayValue
    tokenArrayComma
    tokenObjectStart
    tokenObjectKey
    tokenObjectColon
    tokenObjectValue
    tokenObjectComma
)

Let's back to your code. You are calling Token() method. This method obtains first JSON-valid token { and changes tokenState from tokenObjectValue to the tokenObjectStart (link). Now you are "in-an-object" state.

If you try to call Decode() at this point you will get an error (not at beginning of value). This is because allowed states of tokenState for calling Decode() are tokenTopValue, tokenArrayStart, tokenArrayValue, tokenObjectValue, i.e. "full" value, not part of it (link).

To avoid this you can just don't call Token() at all and do something like this:

dec := json.NewDecoder(strings.NewReader(dataMapFromJson))

objects := make(map[string]*ReadObject)
if err := dec.Decode(&objects); err != nil {
    log.Fatalf("failed to parse next symbol: %v", err)
}

fmt.Printf("OBJECTS: \n%v\n", objects)

Or, if you want to read chunk-by-chunk, you could keep calling Token() until you reach "full" value. And then call Decode() on this value (I guess this should work).

Upvotes: 2

Related Questions