Reputation: 55
I have a requirement to parse arbitrary json, encode/rewrite any string key or value longer than 85 characters(think of a kind of short URL service) and marshal it back to raw json.
e.g. a document like this {"Name": "Ed", "Text": "long characters....85 chars"}
should become something like this {"Name": "Ed", "Text": "short://ID=3403"}
. This example is simple but I have to deal with complex structures with nested objects and arrays and basically unknown structures.
My question is: Is it possible to do this using a known library or even the standard encoding/json package?
What I actually need is the equivalent of an implementation of Marshaler interface on type string
but we know that's not possible so I'm wondering what other options I have (beside forking the standard encoding/json package)
Upvotes: 0
Views: 172
Reputation: 239871
Here is a an approach (tested) which uses json.Decoder
's Token() method, and avoids deserializing the whole data structure into memory. It's mostly done for fun and for the sake illustrating how you can use that method, but it could possibly be of some use if your JSON documents are very large and you would like to avoid the memory overhead.
// For example
r, w := os.Stdin, os.Stdout
dec := json.NewDecoder(r)
// Avoids any round-trip loss by leaving numbers un-parsed
dec.UseNumber()
// Current state (in an object or in an array; following the open
// delim, or a key, or a value). The decoder has its own stack pretty
// like this one, but it's private, so keep our own.
const (
jsonArrayStart = iota
jsonArrayVal
jsonObjectStart
jsonObjectKey
jsonObjectVal
)
stack := []byte{}
for {
t, err := dec.Token()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
switch val := t.(type) {
case json.Delim:
// Copy delimiters out, and push/pop the state stack as appropriate
w.WriteString(string([]rune{rune(val)}))
switch val {
case '[':
stack = append(stack, jsonArrayStart)
case '{':
stack = append(stack, jsonObjectStart)
case ']', '}':
stack = stack[:len(stack)-1]
}
// The rest of the cases just copy values out
case nil:
w.WriteString("null")
case json.Number:
w.WriteString(string(val))
case bool:
if val {
w.WriteString("true")
} else {
w.WriteString("false")
}
case string:
// Modify strings if called for (shortenString needs to be provided)
if len(val) >= 85 {
val = shortenString(val)
}
encoded, err := json.Marshal(val)
if err != nil {
log.Fatal(err)
}
w.Write(encoded)
}
if dec.More() {
// If there's more in the current array/object, write a colon or comma
// (if we just wrote a key/value), and set the next state.
// Arrays start with a value, and follow with more values.
// Objects start with a key and alternate between key and value.
switch stack[len(stack)-1] {
case jsonArrayStart:
stack[len(stack)-1] = jsonArrayVal
case jsonArrayVal:
w.WriteString(",")
// State remains jsonArrayVal
case jsonObjectStart:
stack[len(stack)-1] = jsonObjectKey
case jsonObjectKey:
w.WriteString(":")
stack[len(stack)-1] = jsonObjectVal
case jsonObjectVal:
w.WriteString(",")
stack[len(stack)-1] = jsonObjectKey
}
} else {
if len(stack) == 0 {
// End after the first complete value (array/object) in the stream
break
}
if stack[len(stack)-1] == jsonObjectKey {
// Should never happen
log.Fatal("Object key without a value?")
}
}
}
Upvotes: 0
Reputation: 120951
Yes, it is possible using the standard library and a little bit of code.
Unmarshal to interface{}
var v interface{}
if err := json.Unmarshal(data, &v); err != nil {
// handle error
}
Traverse the unmarshaled value, shortening strings and rewriting as you go:
func shorten(v interface{}) interface{} {
switch v := v.(type) {
case []interface{}:
for i, e := range v {
v[i] = shorten(e)
}
return v
case map[string]interface{}:
m := make(map[string]interface{})
for k, e := range v {
m[shortenString(k)] = shorten(e)
}
return m
case string:
return shortenString(v)
default:
return v
}
}
The function shorten
calls shortenString(s string) string
to convert a long string to short://ID=xxx
.
Marshal the value back to JSON:
p, err := json.Marshal(shorten(v))
if err != nil {
// handle error
}
// p is a []byte
fmt.Printf("%s\n", p)
Upvotes: 2
Reputation: 114481
If the requirement is to replace all strings longer than 85 characters with a new shortened version in a JSON and if you can assume the JSON is a valid JSON (i.e. if you're not required to give a diagnostic in case of invalid input) then you can just use a single regexp substitution.
A quoted string in a valid JSON can be matched with
/"([^\\"]|\\.)*"/
therefore you could just replace all instances of this pattern with the possibly shortened version.
Upvotes: 1