jeremiahx|com J.J. Merrick on Facebook jeremiahx on Twitter

LEFT OUTER JOIN will kill you! or “It’s the database, stupid”

I have been doing web programming for about 10 years now… wow it really has been 10 years… and I have worked on a LOT of applications. Large ones, small ones, medium ones. Ones that are screaming fast and ones that are slooooooooooooooooow.

Well lately I have been assigned to upkeep a 4 year old application that is an enterprise level application. It is for a very large big name retail client that I am sure all of you have shopped at once in your life. This application manages a small part of their operations but for that small part it is a huge freak’n deal. The application is extremely slow and there is really no reason for it to be. There are a couple of quarky things about it that slow it down a little bit but they really only add 1-2 seconds on to a page load time so that is bareable. The one thing that really kills it is its database interaction layer. The queries are enormous. I mean I have some pretty sweet SQL skills but these make even MY head spin. I installed FusionReactor on to one of the servers in the cluster just to get an idea of some of the trouble spots in the application. I noticed one that was running on a couple of occasions in the MILLIONS of milliseconds… 1.9 million to be precise. Yes the previous developer would up the timeout on the page and add a “LOADING…” image instead of fixing the REAL issues. The query was a stored procedure so I decided to turn it back in a CFQUERY call so I could dissect it better.

The things I noticed first off was some of the things they did so that it could be a stored procedure. Like doing 2 queries and joining them together… and the second query pulled about 221,000 rows then JOINED the ones it needed which usually was only 1-10 records. Talk about inefficient. Fixing that so that it only pulled what it needed in the subselect trimmed off about 5 seconds from the 45 second page load.

The next thing I started to notice was the really issue. There pretty much was no INNER JOIN’s in the query. The person writing it had done all LEFT OUTER JOIN’s. I started going down the list and noticed that in 8 of the 10 “LOJ’s” it was data that had to be there. Take for instance they LOJ a store table that well you HAVE to have a storeID on every order. I did a quick DB call to find all orders that had a NULL storeID on them and sure enough NONE of them had a NULL storeID. 3 seconds shaved.

I went down the line looking at each LOJ and came to the one that was the killer. It was an authorization table that once again HAD to have data in it for it to return. They LOJ it and then in the code would not display the row if it was NULL… What huh?! Changed to that INNER JOIN… BAM it was now a 4 second query. Exactly 41 seconds shaved off.

So kiddies the moral of the story is DO NOT LEFT OUTER JOIN unless you really really have to. Second moral is that while we all will argue over whether StructKeyExists vs IsDefined is faster or IIF being the demise of the world the issues 90% of the time are not in your CF code but in your database. Be it your SQL queries or your indexes most of your bottleneck is the data getting to your application. Think about that from the beginning and you will have a much better programming life.

18 Comments

Steve Bryant  on February 15th, 2008

I think this makes a couple of very good points: Worry about order-of-magnitude performance improvements over incremental ones – seconds over milliseconds and the biggest performance issues are usually coming from SQL/images/HTML than from ColdFusion itself.

tony petruzzi  on February 15th, 2008

are you sure that your problem was the left outer join? I would be curious to know if the person had an index on the column. Most DB that I have taken over or been contracted to speed up have left an index off of the foreign key column on the joining table. This is a very common occurrence as almost any db will put an index on a primary key column, but none I know put an index on a foreign key one.

I would also argue the stored procedure point. If the db you’re using support indexed views, you will certainly see an increase in performance by breaking the query into indexed views and performing a query using those view as apposed to just sticking the queries inside a stored procedure.

just my two cents… i expected change.

Sana  on February 15th, 2008

@tony: I agree with index views, mssql 2005 indexed views are really in performance.

@Merrick: If you have proper indexes then left-outer-join are not that slow.

@Steve: sometime we have serious issue with CF… just try to build 100MB xml file, hopefully your cf-engine will die.

J.J. Merrick  on February 15th, 2008

@tony The indexes are next. We have a DBA that is going to take a look at all that and tell us where we are going wrong. That is not my forte so he will help on that one.

@Sana You should always start with an inner join and then left outer join only when the data could be NULL. At least that is what I was always told. I could be very wrong

@Tony… he is your half pence change :-)

Noam Chomsky  on February 15th, 2008

>>The things I noticed first off was some of the things

>>a store table that well you HAVE to have

>>Thing about that from the beginning

Etc.

Might want to think about your spelling and grammar from the beginning too :-)

J.J. Merrick  on February 17th, 2008

Thank you Noam Chomsky for reading my blog… it is such an honor to have someone of your literary stature critique my piece of the web.

a-hole.

rudy  on February 19th, 2008

@tony: you said “almost any db will put an index on a primary key column, but none I know put an index on a foreign key one”

mysql will ;o)

“InnoDB requires indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. The index on the foreign key is created automatically. This is in contrast to some older versions, in which indexes had to be created explicitly or the creation of foreign key constraints would fail.”
http://dev.mysql.com/doc/refman/5.0/en/innodb-foreign-key-constraints.html

J.J. Merrick  on February 19th, 2008

@rudy MySql FORTHEWIN! :-)

Cutter  on February 20th, 2008

Nice piece JJ

Daily Tech Impressions  on February 21st, 2008

Good read, however I wanted your readers to know I made an Entrecard Toolbar, it can be downloaded at:
http;//www.dailytechnologyimpressions.com/?p=16

Homer  on March 5th, 2008

Hmmm … this environment sounds eerily familiar! You must not have a formal education with 10 years experience and being 28? Just an observation about your experience claim.

I agree with Sana … outer joins are not that inefficient with relevant indexes. It goes without saying that proper database design can save a few headaches.

J.J. Merrick  on March 6th, 2008

No I don’t have a “formal” education. I started working early and have gotten a degree in real-life experience.

sandeep  on March 30th, 2008

select a.mail_id,a.forword_agency from mail a
left outer join forwordmail b on a.mail_id=b.mail_id;

find error:SQL command not properly ended

Nila  on May 5th, 2008

Ha! What a bunch of “katty” programmers!

p.s. thanks for the tips though :-)

Chris  on May 8th, 2008

I’ve seen worse coding before…

In my first ever job, I saw an end of day process that took 30-60 minutes, I was asked to try and shave a few minutes off of this..

I managed to reduce it to 30 seconds…

All this was in a language called Uniface, something I hope to avoid ever using again…

djoko s  on August 5th, 2008

wow …
great topic
since i’m not a good trained programmer
just a learning by doing programming
so all i do just write a run programming

and rarely think about performance,
notably with a deadline

thanks

andy matthews  on September 24th, 2008

JJ…

You should feel quite privileged to have Rudy comment on your blog. He’s a SQL master, and a prolific poster over at the Sitepoint forums. He’s also a Canuck, and a great frisbee golf player.

He’s helped me increase my skills in SQL over the last 7 or 8 years.

J.J. Merrick  on September 26th, 2008

@Andy That’s cool. This stupid post written 8 months ago is STILL my top trafficked post. It was written on a while driven by a rant :-)

Leave a Comment