As someone who demos Elasticsearch and Kibana quite a bit, the new “http_poller” input to Logstash [1] is probably the most useful tool I have yet run into for quickly cooking up real-time demos for Elasticsearch that use live data.
Here’s a quick outline of what I’m going to do in logstash:
We start with the poller. This is the part I love. No more writing cron jobs and tiny python scripts just to get get data from the web on a schedule. In order for this to work you have to install the plugin using Logstash’s new ruby gem based plugin install feature. In this case I’m grabbing the Captial Bikeshare station avaialability XML web endpoint every 60 seconds.
Each pull is a huge XML file. Inside this file is a list of enumerated station data inside a series of tags all called <station> . You can do a one time pull with your web browser by hitting the following link:
At this point in our logstash pipeline, the XML payload is entirely in the “message” field as a string. The first step is to tell Logstash to interpret that string as XML and put the deserialized data into a field called “parsed”.
Next we split the big XML event into a separate event per station with the split command. Rather than keep everything in there, we’ll just pull out a couple of specific values that I want.
Next we do some type correction, correct formatting for the geospatial point, and interpretation of the source date.
And we wrap up by inserting the result into Elasticsearch
It’s super important to set up the Elasticsearch mapping before indexing and data (a.k.a. running logstash with this config file). The following mapping template tells Logstash what kind of mapping to set up every time logstash starts an index for a new date. Note the special handling on the geospatial value to make sure we use the latest and greates features of Elasticsearch 1.5.2+ . The setting “geohash”: true is especially important. Without that we won’t really being doing any geo-indexing.
and to make Kibana more intelligent about looking at recent data, we’ll use a time based index pattern when making our index pattern in Kibana