LEFT OUTER JOIN will kill you! or “It’s the database, stupid”
I have been doing web programming for about 10 years now… wow it really has been 10 years… and I have worked on a LOT of applications. Large ones, small ones, medium ones. Ones that are screaming fast and ones that are slooooooooooooooooow.
Well lately I have been assigned to upkeep a 4 year old application that is an enterprise level application. It is for a very large big name retail client that I am sure all of you have shopped at once in your life. This application manages a small part of their operations but for that small part it is a huge freak’n deal. The application is extremely slow and there is really no reason for it to be. There are a couple of quarky things about it that slow it down a little bit but they really only add 1-2 seconds on to a page load time so that is bareable. The one thing that really kills it is its database interaction layer. The queries are enormous. I mean I have some pretty sweet SQL skills but these make even MY head spin. I installed FusionReactor on to one of the servers in the cluster just to get an idea of some of the trouble spots in the application. I noticed one that was running on a couple of occasions in the MILLIONS of milliseconds… 1.9 million to be precise. Yes the previous developer would up the timeout on the page and add a “LOADING…” image instead of fixing the REAL issues. The query was a stored procedure so I decided to turn it back in a CFQUERY call so I could dissect it better.
The things I noticed first off was some of the things they did so that it could be a stored procedure. Like doing 2 queries and joining them together… and the second query pulled about 221,000 rows then JOINED the ones it needed which usually was only 1-10 records. Talk about inefficient. Fixing that so that it only pulled what it needed in the subselect trimmed off about 5 seconds from the 45 second page load.
The next thing I started to notice was the really issue. There pretty much was no INNER JOIN’s in the query. The person writing it had done all LEFT OUTER JOIN’s. I started going down the list and noticed that in 8 of the 10 “LOJ’s” it was data that had to be there. Take for instance they LOJ a store table that well you HAVE to have a storeID on every order. I did a quick DB call to find all orders that had a NULL storeID on them and sure enough NONE of them had a NULL storeID. 3 seconds shaved.
I went down the line looking at each LOJ and came to the one that was the killer. It was an authorization table that once again HAD to have data in it for it to return. They LOJ it and then in the code would not display the row if it was NULL… What huh?! Changed to that INNER JOIN… BAM it was now a 4 second query. Exactly 41 seconds shaved off.
So kiddies the moral of the story is DO NOT LEFT OUTER JOIN unless you really really have to. Second moral is that while we all will argue over whether StructKeyExists vs IsDefined is faster or IIF being the demise of the world the issues 90% of the time are not in your CF code but in your database. Be it your SQL queries or your indexes most of your bottleneck is the data getting to your application. Think about that from the beginning and you will have a much better programming life.





Steve Bryant on February 15th, 2008
I think this makes a couple of very good points: Worry about order-of-magnitude performance improvements over incremental ones – seconds over milliseconds and the biggest performance issues are usually coming from SQL/images/HTML than from ColdFusion itself.