Clustered Index
Most tables should have a clustered index* since there is little point storing the data in a random order. Some DBMSs have problems with hotspots if you use a clustered index on a column containing a monotonically increasing id or a date column in OLTP databases. SQL Server 7+ handle this quite nicely though. You probably want to pick the column the majority of large queries will be performed on (often a date column) This article on Clustered Indexes in SQL Server goes into more detail.
Index Selectivity
There is little point putting an index on a column that is not very selective (since the optimiser will probably ignore it). The canonical example of a non-selective index is a column indicating sex (since it only has two possible values).** Broadly speaking if a query returns more than 5-10% of the table the index is unlikely to be used since a table scan will be more efficient.
Foreign Key Indexes
Most of the time an index should be created on any foreign keys within a table (indexes on Primary Keys are created for you)
Query Patterns
Obviously there is little point having an index if it is never going to form part of a query. It’s ok not to have every where clause covered by a query but your core queries should be covered
Key Length on Indexes
Indexes take up space! It’s not uncommon for the total size of all the indexes in a database to be greater than the total size of all the data. Keeping this in mind you should favour indexes on columns containing small datatypes (numbers and dates not large text columns)
Composite Indexes
Related to keeping index sizes down be careful when using composite indexes (as an aside you should avoid composite primary keys if you can) be careful how you order the columns that make up the index.
E.g.
If you have an index on columns A, B and C (in that order) the index will only be used if you’re querying on A, B, C or A, B or A. If you query on A, C the index won’t be used.
Covering Indexes
A covering index is one where all the columns required from the table in question are in the index. This saves an I/O operation to fetch the row from the table. It can be advantageous to include an extra column (or two) that aren’t strictly needed if you expect this index to be used frequently.
Remember that the optimiser can only use the data it has available to it so you must ensure all statistics are kept updated for optimal performance!
* A clustered index means that the data in the table is stored on disk in the same order as the index. For obvious reasons this mens you can only have one per table
** You can calculate selectivity for a coulumn using the following queries
– This will get the total number of rows in the table
SELECT count(*) from table
– This will get the counts for each value
SELECT colname, count(*) from table
GROUP BY colname
You can then divide the count for each value by the total number of rows. Then multiply by 100 to get the percentage. Low numbers = high selectivity!






0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.