Releasing the tiger: how to profit from Solr in your Tridion CM

Any Tridion installation comes with a powerful search engine to facilitate the full-text search in the Content Manager Explorer: Solr. The way Solr is used in Tridion reminds me of a caged tiger. After all, Solr is one of the most popular search engines in the world, and it’s immensely powerful. The full capabilities of Solr are definitely not exposed in Tridion, neither in the small search box nor in the advanced search screen. In this blog, I will explain how you can release the tiger!

Many of you are familiar with this screen: it is Tridion’s “advanced search” screen.

At first site, the possibilities on this screen are quite impressive. To name a few: you can search by modification date, author, keyword, and item type. You can look for items that are published or not and you can limit the results to items that are shared, local or localized. You can also search for items based on a particular schema, and even search within fields.

And yet, there is a lot you cannot do:

  • Search for components based on schema A or schema B
  • Search for a combination of field values (say: a component which has ‘termA’ in the headline and ‘termB’ in the description)
  • Search inside a field without specifying a schema
  • Search for items based on a given schema across ALL publications

What’s more, you are unable to use specific SOLR features like boosting fields, or sorting by a specific field.

Fortunately, you don’t have to let Tridion limit you. On any Tridion installation you can access SOLR directly, normally by pointing your browser to port 8983 of the hostname you use for the Tridion CME (e.g. http://my.tridion:8983). You will get a prompt for a username and password. The “MTSUser” (Tridion’s system user) will work.

This brings up the SOLR Admin page. You can run queries here (first select the Tridion code, then click on Query). This is a great way to experiment with all the queries you would like to run.

The biggest problem at first will be to understand how to effectively query the SOLR index. There is a neat little trick for that. First, shut down the Tridion Content Manager Search Host windows service (before 9.1 it was called ‘SDL Web Content Manager Search Host’). Next, open a command line in the solr-tomcat\bin folder within your Tridion base folder, and start TcmSearchHost (without any parameters). Now every time you search for something in the Tridion CME, you will see the exact query in the console output.

A quick example: I used the advanced search to find items based on a schema with ID tcm:5:2525, that have the word ‘Generation’ in the field ‘headline’. The TcmSearchHost process now writes this to the console:

[http-nio-8983-exec-3] INFO org.apache.solr.core.SolrCore - [tridion] webapp= path=/select params={q=(RepositoryId:tcm\:0\-3\-1+OR+RepositoryId:tcm\:0\-5\-1+OR+RepositoryId:tcm\:0\-7\-1+OR+RepositoryId:tcm\:0\-109\-1+OR+RepositoryId:tcm\:0\-148\-1)+AND+OrganizationalItemAncestorIds:tcm\:*\-4\-2+AND+(SchemaId:tcm\:*\-2525\-8+AND+CatchAllXml:"headline+Generation+headline"~1000000)&indent=true&wt=xml&_=1575463948751} hits=1 status=0 QTime=4

The blue text is the actual Solr query. If you replace the ‘+’ signs with spaces, you have a working query which you can for instance paste into the Solr Admin interface:

(RepositoryId:tcm\:0\-3\-1 OR RepositoryId:tcm\:0\-5\-1 OR RepositoryId:tcm\:0\-7\-1 OR RepositoryId:tcm\:0\-109\-1 OR RepositoryId:tcm\:0\-148\-1) AND OrganizationalItemAncestorIds:tcm\:*\-4\-2 AND (SchemaId:tcm\:*\-2525\-8 AND CatchAllXml:"headline Generation headline"~1000000)

Now comes the fun part: we can extend or completely change this query. For example, this is how you search for items that have the word ‘Generation’ in the field ‘headline’ AND the word ‘Catering’ in the field ‘title’:

(RepositoryId:tcm\:0\-3\-1 OR RepositoryId:tcm\:0\-5\-1 OR RepositoryId:tcm\:0\-7\-1 OR RepositoryId:tcm\:0\-109\-1 OR RepositoryId:tcm\:0\-148\-1) AND OrganizationalItemAncestorIds:tcm\:*\-4\-2 AND (SchemaId:tcm\:*\-2525\-8 AND CatchAllXml:"headline Generation headline"~1000000 AND CatchAllXml:"title Catering title"~1000000)

Another example: the query contains a lot of references to the RepositoryId field. They refer to the ID of the publication you are searching in, plus all the parent publications in the blueprint chain. But often, you want to search for content in a certain field, regardless of the publication it’s in. You can do that by simply removing the RepositoryIds from the query altogether:

OrganizationalItemAncestorIds:tcm\:*\-4\-2 AND (SchemaId:tcm\:*\-2525\-8 AND CatchAllXml:"headline Generation headline"~1000000 AND CatchAllXml:"title Catering title"~1000000)

You can even search within a certain field without specifying a schema:

CatchAllXml:"headline Generation headline"

The console output of the search host can be a source of inspiration.

Of course, I’m not suggesting that you give everybody the password of the MTSUser. Once you understand how these queries work, you can build your own tool (a custom page, for instance) based on the Solr data, and call the Solr REST service from it.

Whether you want to implement your own advanced search, or create a custom reporting solution, or perhaps even build a complex template, it is good to know that you have the power of Solr at your fingertips. Go ahead, release that tiger!