April 2005
M T W T F S S
« Mar   May »
 123
45678910
11121314151617
18192021222324
252627282930  

Recent Posts

Recent Comments

Ads


« Nokia back on top? | Main | Sort-Merge Joins »

Hash Joins

By dave | April 29, 2005

Lets use the same query we looked at when talking about nested-loop joins and see how that would be processed using a hash join.

SELECT * FROM emp, dept WHERE emp.deptno = dept.deptno

A hash join is different to a nested-loop join in that the tables are accessed independently and then rows from each table are matched using the join condition.

The steps taken to execute this query are as follows:-

  1. Firstly a full table scan is performed on dept (being the smaller table)
  2. The results of this table scan are placed in an in memory hash
  3. Then a full table scan is performed on dept
  4. As each row is processed the corresponding row is looked up in the hash of dept

So in pseudo-code this looks like:-

For Each deptrow in dept
  Add deptrow to hash (keyed on dept.deptno)
End For

For Each emprow in emp
  Look up deptrow from hash using emp.deptrow as key
  Combine columns from emprow and deptrow
  Return combined row
End For

Clearly this approach is not as robust as the the nested-loop join since it requires scratch memory (or disk, if the rowset is large enough) space. This could cause poor performance or possibly even error due to lack of space. Hash joins should only be considered if you are sure the smaller rowset will fit in memory

Ok, so if hash joins are less robust than nested-loop joins when should you consider using one?

Obviously you should test performance using a hash join and a nested-loop join before deciding which is best for your query.

Tags: ,
Topics: /Databases |

Comments