user7425610
user7425610

Reputation: 55

Is it possible to rewrite strings in unknown json structure?

I have a requirement to parse arbitrary json, encode/rewrite any string key or value longer than 85 characters(think of a kind of short URL service) and marshal it back to raw json. e.g. a document like this {"Name": "Ed", "Text": "long characters....85 chars"} should become something like this {"Name": "Ed", "Text": "short://ID=3403"}. This example is simple but I have to deal with complex structures with nested objects and arrays and basically unknown structures.

My question is: Is it possible to do this using a known library or even the standard encoding/json package? What I actually need is the equivalent of an implementation of Marshaler interface on type string but we know that's not possible so I'm wondering what other options I have (beside forking the standard encoding/json package)

Upvotes: 0

Views: 172

Answers (3)

hobbs
hobbs

Reputation: 239871

Here is a an approach (tested) which uses json.Decoder's Token() method, and avoids deserializing the whole data structure into memory. It's mostly done for fun and for the sake illustrating how you can use that method, but it could possibly be of some use if your JSON documents are very large and you would like to avoid the memory overhead.

// For example
r, w := os.Stdin, os.Stdout

dec := json.NewDecoder(r)
// Avoids any round-trip loss by leaving numbers un-parsed
dec.UseNumber()

// Current state (in an object or in an array; following the open
// delim, or a key, or a value). The decoder has its own stack pretty
// like this one, but it's private, so keep our own.
const (
    jsonArrayStart = iota
    jsonArrayVal
    jsonObjectStart
    jsonObjectKey
    jsonObjectVal
)
stack := []byte{}

for {
    t, err := dec.Token()
    if err == io.EOF {
        break
    } else if err != nil {
        log.Fatal(err)
    }

    switch val := t.(type) {
    case json.Delim:
        // Copy delimiters out, and push/pop the state stack as appropriate
        w.WriteString(string([]rune{rune(val)}))
        switch val {
        case '[':
            stack = append(stack, jsonArrayStart)
        case '{':
            stack = append(stack, jsonObjectStart)
        case ']', '}':
            stack = stack[:len(stack)-1]
        }
        // The rest of the cases just copy values out
    case nil:
        w.WriteString("null")
    case json.Number:
        w.WriteString(string(val))
    case bool:
        if val {
            w.WriteString("true")
        } else {
            w.WriteString("false")
        }
    case string:
        // Modify strings if called for (shortenString needs to be provided)
        if len(val) >= 85 {
            val = shortenString(val)
        }
        encoded, err := json.Marshal(val)
        if err != nil {
            log.Fatal(err)
        }
        w.Write(encoded)
    }

    if dec.More() {
        // If there's more in the current array/object, write a colon or comma
        // (if we just wrote a key/value), and set the next state.
        // Arrays start with a value, and follow with more values.
        // Objects start with a key and alternate between key and value.
        switch stack[len(stack)-1] {
        case jsonArrayStart:
            stack[len(stack)-1] = jsonArrayVal
        case jsonArrayVal:
            w.WriteString(",")
            // State remains jsonArrayVal
        case jsonObjectStart:
            stack[len(stack)-1] = jsonObjectKey
        case jsonObjectKey:
            w.WriteString(":")
            stack[len(stack)-1] = jsonObjectVal
        case jsonObjectVal:
            w.WriteString(",")
            stack[len(stack)-1] = jsonObjectKey
        }
    } else {
        if len(stack) == 0 {
            // End after the first complete value (array/object) in the stream
            break
        }
        if stack[len(stack)-1] == jsonObjectKey {
            // Should never happen
            log.Fatal("Object key without a value?")
        }
    }
}

Upvotes: 0

Thundercat
Thundercat

Reputation: 120951

Yes, it is possible using the standard library and a little bit of code.

Unmarshal to interface{}

var v interface{}
if err := json.Unmarshal(data, &v); err != nil {
  // handle error
}

Traverse the unmarshaled value, shortening strings and rewriting as you go:

func shorten(v interface{}) interface{} {
    switch v := v.(type) {
    case []interface{}:
        for i, e := range v {
            v[i] = shorten(e)
        }
        return v
    case map[string]interface{}:
        m := make(map[string]interface{})
        for k, e := range v {
            m[shortenString(k)] = shorten(e)
        }
        return m
    case string:
        return shortenString(v)
    default:
        return v
    }
}

The function shorten calls shortenString(s string) string to convert a long string to short://ID=xxx.

Marshal the value back to JSON:

p, err := json.Marshal(shorten(v))
if err != nil {
    // handle error
}

// p is a []byte
fmt.Printf("%s\n", p)

playground example

Upvotes: 2

6502
6502

Reputation: 114481

If the requirement is to replace all strings longer than 85 characters with a new shortened version in a JSON and if you can assume the JSON is a valid JSON (i.e. if you're not required to give a diagnostic in case of invalid input) then you can just use a single regexp substitution.

A quoted string in a valid JSON can be matched with

/"([^\\"]|\\.)*"/

therefore you could just replace all instances of this pattern with the possibly shortened version.

Upvotes: 1

Related Questions