wildabeast
wildabeast

Reputation: 1832

How to convert existing Elasticsearch data from string to number

I am streaming AWS Cloudwatch logs (from a Node.js Lambda application) to an AWS Elasticsearch cluster, so that I can view metrics in Kibana.

Some of the data I was streaming was numeric, but was being logged as strings. I've updated the application code to log these as numeric values, however I can't use numeric visualizations in Kibana on those fields because the field type is now mixed -- i.e. in Kibana settings it says 13 fields are defined as several types (string, integer, etc) across the indices that match this pattern...

Is there a straightforward way to force ES / Kibana to treat that field as always numeric? Or convert all of the older logged data from string to number?

My searches have indicated I can do this with some kind of mutation using the ES API, but I can't track down what this API call would actually look like. Disclaimer: Elasticsearch noob.

Upvotes: 0

Views: 7984

Answers (2)

wildabeast
wildabeast

Reputation: 1832

Here is the scripted field I created, thanks to Abhishek's answer:

String key = 'myfield';

if (doc.containsKey(key + '.keyword')) { 
    key += '.keyword';
    if (doc[key].size() != 0 && doc[key] != null) {
        if (doc[key].value instanceof String) {
            return Double.parseDouble(doc[key].value);
        }
    }
} else if (doc.containsKey(key) && doc[key].size() != 0 && doc[key] != null) {
    return doc[key].value;
}

Upvotes: 1

Abhishek Jaisingh
Abhishek Jaisingh

Reputation: 1732

There are two approaches here:

  1. Convert all the data from strings to numeric values. Essentially, you'll have to reindex the whole data(we can't just change the field type with one click), making sure that the strings are converted / typecast to numeric values. The best way to reindex is to use Ingest Node Pipelines

Pros: Visualizations built on this data will be fast as the data is already in numeric format.

Cons: If the data set is huge this conversion can take long time.

  1. Keep all data in string format as-it-is and use Scripted Fields in Kibana, to convert the data to numeric format at runtime e.g. whenever you visualize

Pros: No need to setup a whole new pipeline to convert the data

Cons: Visualizations on large timeframes might be too slow / heavy for your infrastructure.

Upvotes: 1

Related Questions