I’ve toyed with Sphinx before. It blew my socks off… Until it broke.
Sphinx is an Open Source indexing engine, which is supported out of the box by the forum software, Invision Power Board (IPB). Essentially what Sphinx does, is run a bunch of queries against the IPB database, usually hosted on MySQL, pulling the entire post content (amongst other content types) and indexing it efficiently.
The result is blisteringly fast searches, arguably 10 times faster then the highly optimised MySQL queries employed by the IPB software. On a forum with over 1 million posts, containing over 1GB of text, SQL just can’t keep up.
In my environment, I run two servers – one for database, and one for web serving. Sphinx sits on my database server, along with the IPB database.
Setting this all up is pretty easy, and Invision have a great tutorial here – http://community.invisionpower.com/resources/documentation/index.html/_/tutorials/large-communities/setting-up-sphinx-r181
The only issue I had in setting up Sphinx was the fact it is installed on a separate server to the webserver. This required me to modify the configuration as generated by IPB, to change the “listen” value from 127.0.0.1 to the private IP which connects the database server to my web server. I’ve also found that the listen value can be set to 0.0.0.0, although I’ve left this as the private IP.
My firewall rules allow all traffic from the database server to the webserver and vice versa, so nothing had to happen there, however, it may be something to note, as port 3312 will need to be opened to be able to connect to the Sphinx search daemon.
Lastly, configuring IPB also required me to whack the private IP in the settings, instead of using localhost/127.0.0.1.
All good. And thats how you get fast searches!
Something which I took a little longer to come to grasps with, was the indexing schedule. IPB configures delta indexes and gives you the cron setup to run the delta indexes every 5 minutes. In addition to that, a cron job runs a full re-index every night at 4AM.
This is where I’m having issues. The delta index updates, and I can search it separately. It shows the last posts since the last full index. This is good. However, this index isn’t being included when you search the forum. I have to run a full index for new threads to show up in Active Topics / View New Content…
This is obviously not meant to happen, and I’ve logged it on Invision’s community forum, so hopefully see some resolution shortly. Until then, I’ll continue looking through the IPB source code trying to find an error, because it can’t be my fault, can it? http://community.invisionpower.com/topic/329171-active-topics-sphinx-not-showing-new-threads/page__p__2063533__fromsearch__1#entry2063533
Query log shows:
[Wed Jan 5 17:59:19.404 2011] 0.099 sec [scan/4/attr- 63 (0,25) @last_post_group] [forums_search_posts_main,forums_search_posts_delta]
Sphinx code for the forum is at /admin/applications/forums/extensions/search/engines/sphinx.php