Comparing Elasticsearch with Solr

Solr and Elasticsearch have emerged as the two leaders of open source search engines. While both have one thing in common, Lucene, they have plenty of differences. Because of this, one may be a better choice than the other. That depends largely on what your intended purposes are. Check out the following differences to determine which solution may be the best for your project.

Global vs. Segmented Caching

It would take a significant amount of time to explain the details of each search engine’s use of caching. Instead, the important thing to note is that there is one significant difference between the methods each uses. In Lucene, indexes are comprised of data files. This information is in large part unchangeable. Indexes are divided up into segments. During the indexing process, new segments can be created. Further, Lucene can also create larger segments out of smaller ones.

The developers of Elasticsearch opted to use a one to one relationship between caches and segments. What this means is that if there is a change to a segment, only that small bit of data needs to be refreshed. On the other hand, Solr uses a global method. This means that when one thing changes within the cache, everything must be refreshed. This is a process that can take a significant amount of time, it’s also hardware intensive.

Query Execution

Everyone knows that range queries aren’t always the best for performance. This is due to the fact that a large chunk of data needs to be processed in order to determine whether or not the search criteria has been met. Doc values can reduce that processing time. In Elasticsearch, these values are automatically enabled. This means that the search engine knows if it needs to search all documents, or if it can focus on iterating through a subset. Currently, Solr does not appear to offer this. Because of this, range based queries may take longer.

Node Discovery

Each of these search engines takes a significantly different approach to node discovery. Node discovery is the process that is responsible for determining which action to take when one of the following occurs:

● The formation of a new cluster
● When a node is damaged
● When a node is added to a cluster

For this process, Solr has opted to use Apache Zookeeper. This is for both the processes of leader election and discovery. To accomplish this, it requires an external Zookeeper ensemble. To implement this, there is a need for a minimum of three Zookeeper instances. On the other hand, Elasticsearch has Zen. To achieve full tolerance, there must be three master nodes dedicated.

High Performing and Precise Analysis Engines

If your data is largely immutable, and high performance along with accuracy are a priority, Solr may be the better choice for you. In tests, Solr proved to outdo Elasticsearch in this area. There is little to no degradation in Solr thanks to the fact that there is no loss of precision.

Development and Support

Here, things are starkly different. To start, Elasticsearch is a fully commercial entity. Solr is a part of the Apache Software Foundation. Apache is a community-based, open source platform. Developers from all over contribute to the code, and anyone who proves their skills and commitment can become a committer. Here, it is the community that is in charge of the code base. Unsurprisingly, there is a significant amount of collaboration and community here. Anyone who has worked on Apache development projects before is familiar with the Apache philosophy and approach.

Then we have Elasticsearch. The code is open source and available for developers to use and modify, and it is possible to submit pull requests to have new code and modifications implemented. However, unlike Solr, the final arbiter of whether or not changes are made to the code base is up to the Elastic corporation. Community members are not allowed to be committers. That is limited to those who are employed by Elastic.

Configuring and Installing The Products

Currently, Elasticsearch has a distribution package size that is less than ⅙ the size of the one offered by Solr. The installation process is simpler for Elasticsearch as well. In fact, it can be installed and ready to run within minutes. It’s got an easy JSON configuration. However, because of its simplicity, there are limitations. For example, developers cannot leave comments for every configuration they change within the file.

Developers of Solr have done a good job of cleaning up some troublesome complications that existed in previous iterations. This has been accomplished largely with the use of Rest APIs. Complexities addressed include, documentation of clustering algorithms and custom sharded collections. Ultimately, your best choice will depend largely on your current project. If you are not using JSON, you may be better off bypassing Elasticsearch and going with SOLR.

Index and Collection Leader Control

Those who go with Elasticsearch will have to accept the fact that they have little control over deciding whether or not particular shards will become primaries or become replicas. This is in spite of the fact that Elasticsearch is highly dynamic in the area of placing shards around the cluster. Conversely, Solr does allow users much more control. Thanks to this, developers have the ability to rebalance leaders. This results in the ability to create a balanced load across each cluster.

Machine Learning Capabilities

For machine learning, Elasticsearch has gone with X-Pack. This is a commercial plugin that works with Kibana. It provides support for machine learning algorithms. These are largely focused on detecting outliers and anomalies that may be found in time series data. The good news is that there are some very nice tools along with professional services. The bad news is that one has to open their wallet and shell out quite a bit of cash. Elasticsearch users might consider alternatives. There are solutions available via cloud services, commercial software houses, and open source tools.

Solr provides machine learning support for free. This comes by way of a contrib module as well as a streaming aggregations framework. By making use of other libraries developers can use machine learning-based feature extraction and ranking models. Here, machine learning is largely focused on classification that works by implementing logistic regression.

Tools and Features

Here, Elasticsearch takes the lead. There’s a long list of tools that can be used with Elasticsearch. There’s Kibana, for example. If you are interested in running SQL you can do that with the combination of Solr and Apache Zeppelin. There are other tools as well. However, we feel as if not much has been done here that’s really groundbreaking or innovative.

Conversely, the Elasticsearch ecosystem is more up to date, and we believe there is much more to speak of here. It works with a new version of Kibana that is constantly being updated with new features. If that isn’t your preference, definitely check out Grafana. It holds its own and offers multiple features. These aren’t the only two tools that work cooperatively with Elasticsearch either. Several tools have been created that make use of Elasticsearch as a data source. This includes multiple data shippers. It’s also important to mention that use of Elasticsearch isn’t just popular among open source enthusiasts. There are several businesses who have gotten on board here.

DevOps Experience

One undeniably important consideration is how DevOps folks feel about these two products. Here, Solr falls a bit behind, but there are still positives and hope for the future. Information that DevOps people need is often fragmented and incomplete. Some information can be gleaned through JMX MBean. There’s also a new Solr Metrics API. There’s a lot of work to be done here, but progress is being made.

Elasticsearch has come quite a bit further at this stage. Troubleshooting Elasticsearch is an easier process as developers are able to easily get information such as work statistics, disk usage, memory, usage of thread pools, caching and buffer information, and more. This is on top of the higher functioning API, easy installation process, and overall manageability.

Speed

It isn’t possible to label one of these search engines as being faster than the other. Speed of performance largely depends on the situation. We discovered that Elasticsearch works very well in environments where data is fluid and changes rapidly. On the other hand, thanks to the way it uses caching, and the use of an uninverted reader for the process of sorting and faceting, Solr is a top performer when data is largely static.

Full Text Search Capabilities

Here, Solr is full of features relating to full-text search. Not only that, but these features are impressively rich. Just check out the Solr code base to learn more. These include the ability to correct spelling mistakes, a variety of request parsers, configurable highlighting support, and suggester implementations.

With Elasticsearch there is a single, dedicated suggesters API. In this case, the details of the implementation are not available to users or developers. This means it is simple to implement. However, there is no opportunity to customize or configure the implementation. This results in lower flexibility. Both products’ highlighting is based on Lucene. However, Solr offers many more options here when it comes to configuration.

Try Them Out

You can best learn about both Solr and Elasticsearch by actually trying them out and comparing them first-hand. You can spin up a cloud server at Vultr (with hourly billing, so you don’t have to pay for a full month), and try both of them. You can also use any other cloud hosting provider with root access.

Solr vs Elasticsearch – Conclusion

It is really impossible to declare either one of these products a winner or a loser. Each has areas in which they clearly come out ahead and areas where they fall behind. Then there are areas where each product is pretty much neck and neck. Ultimately, the best choice for developers depends on several factors. What kind of development community and environment do you prefer? How important is ease of installation and configuration? Most importantly, what kind of data will the search engine need to process? Hopefully, the comparisons made above will make this decision easier to make.

About the Author

This article was submitted to us by a third-party writer. The views and opinions expressed in this article are those of the author and do not reflect the views and opinions of ThisHosting.Rocks. If you want to write for ThisHosting.Rocks, go here.