user3230650
user3230650

Reputation: 65

Should ASCII control characters be stripped from documents before sending to Vespa?

I'm trying to store a document into Vespa with a string field. When using the document-api http endpoint it's getting rejected with a parsing error. I've validated that the correct JSON is being sent (other documents go through fine).

Here is the error message that I'm seeing:

PARSER_ERROR Error in document 'id:x:y:n=1:1FVzo2l7mMLticB0WMkBKIECMLzAg' - could not parse field 'content' of type 'string': The string field value contains illegal code point 0xB

I can see that there's a check for these sorts of characters (vertical tab in my case) com.yahoo.text.Text in allowedAsciiChars but I don't see anywhere in the documentation that I should be stripping these chars before sending to Vespa. In fact I see sort of the opposite situation where Vespa will go out of its way to replace certain chars behind the scenes without rejecting them.

Upvotes: 1

Views: 227

Answers (2)

Jon
Jon

Reputation: 2339

I see sort of the opposite situation where Vespa will go out of its way to replace certain chars behind the scenes

Where do you see this?

There is a Text.stripInvalidCharacters utility method provided as a utility for clients in Java which need to strip characters from non-sanitized text.

Upvotes: 1

Kristian Aune
Kristian Aune

Reputation: 971

Please strip ASCII control characters from the documents before feeding.

I will update the documentation, although is seems the JSON spec says these control characters must be escaped, so these are implicitly not allowed in the feed

Upvotes: 2

Related Questions