Saturday, October 6, 2012

Indexes and the LIKE clause

INDEXES AND THE LIKE CLAUSEOften times, people will make the generalization that when you have LIKE in your WHERE clause it won't use indexes.  The real answer is a little more complicated than that, and I think it's important to understand SQL's behavior so you can make good decisions with your queries. 
(BTW, I'm talking from the SQL 2005 SP2 and above perspective, only because it's what we're using right now)
First of all, there are multiple levels of index usage as seen when you look at an execution plan.  These main operators below can come right out of the execution plan when you hit CTRL+L on a given query from within SQL Management Studio: 
  1. CLUSTERED INDEX SEEK - Very fast and efficient - the table is physically ordered according to specified column(s) and SQL can go find it AND since all of the data is phsically ordered right, can pull the data sequentially
  2. INDEX SEEK - which is very fast and efficient.  The SQL server knows pretty much where the data is and can go directly to it and seek out the rows it needs.  The data isn't ordered in the DB by the fields in the index so it's likely not pulling the data sequentially like it is in the CLUSTERED INDEX SEEK, but it can still translate to a massive improvement in query execution cost/speed.
  3. INDEX SCAN - which as you can see still implies using the index, but it doesn't know exactly where the data is, so it may scan the whole index or a partial range of the index to find it's data. Can be very slow and costly, but still scans through the index as opposed to scanning the physical table. 
  4. CLUSTERED INDEX SCAN - a clustered index is really not much more than a table with the data physically ordered by specified columns.  So if you see a CLUSTERED INDEX SCAN, it's pretty much the same as a TABLE SCAN for performance because SQL has to physically search through every single row in the clustered index (which really IS the table).  
  5. TABLE SCAN - slow, inefficient.  SQL has to physically look at every single row in the table. Shouldn't see this often, if you do, you may need to rewrite something or add indexes.
So, knowing that little background,  the rules for index usage with LIKE are loosely like this:
  1. If your filter criteria uses equals = and the field is indexed, then most likely it will use an INDEX/CLUSTERED INDEX SEEK  
  2. If your filter criteria uses LIKE, with no wildcards (like if you had a parameter in a web report that COULD have a % but you instead use the full string), it is about as likely as #1 to use the index.  The increased cost is almost nothing.
  3. If your filter criteria uses LIKE, but with a wildcard at the beginning (as in Name0 LIKE '%UTER') it's much less likely to use the index, but it still may at least perform an INDEX SCAN on a full or partial range of the index.
  4. HOWEVER, if your filter criteria uses LIKE, but starts with a STRING FIRST and has wildcards somewhere AFTER that 
    (as in 
    Name0 LIKE 'COMP%ER'), then SQL may just use an INDEX SEEK to quickly find rows that have the same first starting characters, and then look through those rows for an exact match. 
(Also keep in mind, the SQL engine still might not use an index the way you're expecting, depending on what else is going on in your query and what tables you're joining to.  The SQL engine reserves the right to rewrite your query a little to get the data in a way that it thinks is most efficient and that may include an INDEX SCAN instead of an INDEX SEEK) 
Consider the following queries: 
select *
from
 dbo.v_R_System
where
 NETBIOS_NAME0 = 'COMPUTER'
select *
from
 dbo.v_R_System
where
 NETBIOS_NAME0 LIKE 'COMPUTER'
select *
from
 dbo.v_R_System
where
 NETBIOS_NAME0 LIKE 'COMPUT%'
select *
from
 dbo.v_R_System
where
 NETBIOS_NAME0 LIKE '%COMPUT%'
When you hit CTRL+L to get the execution plan, for each of these, and mouse-over the operator on the far left, you'll see the following:
 
Notice, the first query uses = instead of LIKE.  It's got the lowest cost and uses an INDEX SEEK.  Very fast and efficient..What we would expect.
The second query uses LIKE, and no wildcards.  The cost is still pretty low and it uses an INDEX SEEK. 
The third query uses LIKE, and a wildcard, but doesn't START with a wildcard.  The cost is the same as the second and still uses INDEX SEEK.
The fourth query however, uses LIKE and starts with a wildcard.  It still uses an index, but the cost is higher and it's an INDEX SCAN so it's going to be less efficient and slower.

So you can see, it's not as simple as LIKE means no indexes.  You can still use LIKE and get pretty darn good use out of your indexes if you just know the limitations and do what you can to query it accordingly.





 

No comments: