Indexing content from Drupal 8 using Elasticsearch

Elasticsearch offers a powerful search API for decoupled architectures. Drupal pushes content changes to an Elasticsearch index, while the front-end queries Elasticsearch directly.

Last week, a client asked me to investigate the state of the Elasticsearch support in Drupal 8. They're using a decoupled architecture and wanted to know how—using only core and contrib modules—Drupal data could be exposed to Elasticsearch. Elasticsearch would then index that data and make it available to the site's presentation layer via the Elasticsearch  Search API

During my research, I was impressed by the results. Thanks to Typed Data API plus a couple of contributed modules, an administrator can browse the structure of the content in Drupal and select what and how it should be indexed by Elasticsearch. All of this can be done using Drupal's admin interface.

In this article, we will take a vanilla Drupal 8 installation and configure it so that Elasticsearch receives any content changes. Let’s get started!

Downloading and starting Elasticsearch

We will begin by downloading and starting Elasticsearch 5, which is the latest stable release. Open https://www.elastic.co/downloads/elasticsearch and follow the installation instructions. Once you start the process, open your browser and enter http://127.0.0.1:9200. You should see something like the following screenshot:

 








Elastic home

Now let’s setup our Drupal site so it can talk to Elasticsearch.

Setting up Search API

High five to Thomas Seidl for the Search API module and Nikolay Ignatov for the Elasticsearch Connector module. Thanks to them, pushing content to Elasticsearch is a matter of a few clicks.

At the time of this writing there is no available release for Elasticsearch Connector, so you will have to clone the repository and checkout the 8.x-5.x branch and follow the installation instructions. As for Search API, just download and install the latest stable version.

Connecting Drupal to Elasticsearch

Next, let’s connect Drupal to the Elasticsearch server that we configured in the previous section. Navigate to Configuration > Search and Metadata > Elasticsearch Connector and then fill out the form to add a cluster:

 








Add cluster

Click 'Save' and check that the connection to the server was successful:

 








Cluster added

That’s it for Elasticsearch Connector. The rest of the configuration will be done using the Search API module.

Configuring a search index

Search API provides an abstraction layer that allows Drupal to push content changes to different servers, whether that's Elasticsearch, Apache Solr, or any other provider that has a Search API compatible module. Within each server, search API can create indexes, which are like buckets where you can push data that can be searched in different ways. Here is a drawing to illustrate the setup:

 








Architecture

Now navigate to Configuration > Search and Metadata > Search API and click on Add server:

 








Search API home

Fill out the form to let Search API manage the Elasticsearch server:

 








Connect Elasticsearch to Search API

Click Save, then check that the connection was successful:

 








Elasticsearch server added

Next, we will create an index in the Elasticsearch server where we will specify that we want to push all of the content in Drupal. Go back to Configuration > Search and Metadata > Search API and click on Add index:

 








Elastic add index

Fill out the form to create an index where content will be pushed by Drupal:

 








Index form

 








Index form 2

 








Index form 3

Click Save and verify that the index creation was successful:

 








Index added

Verify the index creation at the Elasticsearch server by opening http://127.0.0.1:9200/_cat/indices?v in a new browser tab:

 








Index verified

That’s it! We will now test whether Drupal can properly update Elasticsearch when the index should reflect content changes.

Indexing content

Create a node and then run cron. Verify that the node has been pushed to Elasticsearch by opening the URL http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, where elasticsearch_index_draco_elastic_index is obtained from the above screenshot:

 








Content indexed

Success! The node has been pushed but only it’s identifier is there. We need to select which fields we want to push to Elasticsearch via the Search API interface at Configuration > Search and Metadata > Search API > Our Elasticsearch index > Fields:

 








Add fields

Click on Add fields and select the fields that you want to push to Elasticsearch:

 








Browse fields

Add the fields and click Save. This time we will use Drush to reset the index and index the content again:

 








Re-index

After reloading http://127.0.0.1:9200/elasticsearch_index_draco_elastic_index/_search, we can see the added(s) field(s):

 








Content indexed extended

Processing the data prior to indexing it

This is the extra ball: Search API provides a list of processors that will alter the data to be indexed to Elasticsearch. Things like transliteration, filtering out unpublished content, or case insensitive searching, are available via the web interface. Here is the list, which you can find by clicking Processors when you are viewing the server at Search API :

 








Field processors

When you need more, extend from the APIs

Now that you have an Elasticsearch engine, it’s time to start hooking it up with your front-end applications. We have seen that the web interface of the Search API module saves a ton of development time, but if you ever need to go the extra mile, there are hooks, events, and plugins that you can use in order to fit your requirements. A good place to start is the Search API’s project homepage. Happy searching!

Acknowledgements

Thanks to:

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!