Aggregated functions and CASE

by ASH February 03, 2010 13:11

I was stuck with a problem at work where I had a very complex query pulling data out of some tables, some XML and what not, and a lot of data manipulating in the query to be able to easier populate a DataWarehouse.
I then had to expand this query and pull some extra Boolean information out, which were based on a logical comparison of two strings, which I would normal do with a CASE like CASE string1 = string2 THEN 1.

However because I had a GROUP BY clause, I could not simply do this, because the columns the strings were taken from was not in the group clause and I didn’t want them there.
So I found out that you can actually put an aggregated function around the case using the following syntax:

SELECT 
<GroupByFields
>,
MAX
(
    
CASE 
        
WHEN <NonGroupField=  'some value' THEN 
1
        
ELSE 
0
    
END
)
FROM <TABLE
>
GROUP BY <GroupByFields

 

This saved me from using a common table expression or two more, and kept things simple. In essense this allows you to make conditional SUM and similar using this technique.
Just more evidence that a lot is possible in SQL, and just trying something often reveals interesting results.

Selecting latest order

by ASH October 31, 2009 12:15

A common problem to solve is to list, for example the current price for a product or latest order for a customer or similar.
It is a problem which quickly can seem complex, but once understanding the situation, then – as everything else – it is relative simple, and I’ll show how using both window functions (in this instance, RANK) from SQL Server 2005+ and using a sub-query in case working on an earlier version, or other databases.

I’ll take offset in the Northwind database, which I’ve installed on a SQL Server 2008.
It has an Order table with a foreign key to a Customer table; however for this the Order table is the only one of interest:
The Order table contains the following information:
SELECT [OrderID]
      
,
[CustomerID]
      
,
[EmployeeID]
      
,
[OrderDate]
      
,
[RequiredDate]
      
,
[ShippedDate]
      
,
[ShipVia]
      
,
[Freight]
      
,
[ShipName]
      
,
[ShipAddress]
      
,
[ShipCity]
      
,
[ShipRegion]
      
,
[ShipPostalCode]
      
,
[ShipCountry]
  
FROM [Northwind].[dbo].[Orders]
 


In this situation, we’ll get the latest order per customer, but the problem is the same as if you’d need the current price – the only thing which differs will be the tables and where clauses and so on.
Solving the issue using the RANK window function, and a common table expression, it would look like this:

;WITH CTE AS (
SELECT 
RANK() OVER (Partition BY T1.CustomerID ORDER BY OrderDate DESCAS OrderRank
,
T1.
*
FROM 
Orders T1
)
SELECT FROM CTE WHERE OrderRank 
1
ORDER BY CustomerID
   

What happens here is that we use RANK to give us a number partitioned (grouped) by CustomerID and sorted by OrderDate descending. This will give each row selected the number/rank that order is historically, from newest to earliest. That means each row which have the rank 1 will be the latest/newest order.
You can then expand on the joins inside/outside the common table expression to get information about customers or order details or what not.

It is also solvable without using the window function RANK, and would look for example something like this:

SELECT 
FROM 
Orders T1
INNER JOIN 
(
   
SELECT CustomerIDMAX(OrderDateAS 
OrderDate
   
FROM 
Orders T2
   
GROUP BY 
T2.CustomerID
T3 ON T3.OrderDate T1.OrderDate AND T1.CustomerID 
T3.CustomerID
ORDER BY T1.CustomerID
    

Here we utilize a sub-query where we select the CustomerID and the highest order date from Orders, which we then can join into Orders again with a self-join on customer id and order date.

Note that these are just examples. Many similar solutions exists, but they all follow the same methodology.
I’ve also not taken optimizaiton or anything like that into account. It was just to illustrate a solution to a common problem.

 

The IsNumeric Trap

by ASH September 04, 2009 08:13

As mentioned – casually – in the msdn documentation the IsNumeric function will return 1 for some values which aren’t actually numbers.  The currency sign $ is mentioned as well are plus/minus.
Period and comma will also be returning 1.

This essentially means that you can’t be sure that it is actually a number/numeric value which passes the IsNumeric check.
And least of all, you can’t be assured of the semantic value returned is what you expect.
This can – if unaware and not paying attention – be a problem when dealing with number values for countries which do not follow the same period/decimal convention as the US.

Also – “funnily” enough due to the caveats in the IsNumeric, you can’t actually be assured that it can be converted to a Numeric data type. (It can normally always be converted into a Money, but then the name is misleading :) )
To illustrate notice for example the following snippet:

DECLARE @VAR VARCHAR(20
SET @VAR '€,.,,'   
  

This will pass a “IsNumeric” check with the result 1, but it can’t be cast into a numeric/decimal. It can be cast into money but will return 0.00
And that is despite it is not in any form an actual numeric value.
There are some semantic checks built into the IsNumeric such as you can’t have values infront of the currency sign, so

SET @VAR '123$' 

would fail a “IsNumeric” check. Also some checks on the numbers of “plus/minus” signs and so on......
So when using IsNumeric, it is time to be careful and not just accept a success for actual success.

Tags:

SQL

Local variables and batch scope

by ASH July 14, 2009 10:36

I was recently debugging a problem in a Stored Procedure where the wrong values where inserted into a table, in the middle of a long running cursor operation.
The culprit turned out to be the scope of a variable in SQL versus the scope most (object orientated) developers are used to in the language they usually code in.

The problem can be illustrated with this syntax:
DECLARE @Counter INT = 1

WHILE @Counter 
BEGIN
   DECLARE 
@Var VARCHAR(MAX

   
IF @Counter 

       
SET @Var 
'SomeValue'
   
SELECT 
@Var
   
SET @Counter +
1
END

The way scope works for variables in T-SQL is "batch scope", whereas most OO developers are used to the scope being limited within the innermost context.

The way it functions in SQL is that in the first iteration of the while loop the variable "@Var" gets declared but not assigned a value.
Thus, when selecting it out, the result will be NULL.

However in the second iteration, the variable is already declared once, so SQL Server will not "recreate" it, because the scope is "batch scope".
So here it’ll just be assigned the value "SomeValue".

The third iteration is like the second. The variable is already declared, and because it in the previous iteration had the value set to "SomeValue", that value will persist in this iteration, and all subsequent iterations until another value is set.

This is a source of errors if not careful, because the object orientated developer will – if not aware of this – most likely read the above as only iteration 2 will contain the value "SomeValue", whereas all other iterations will contain NULL.
But because of the batch scope, it is not so.

As the result from the query shows:

Tags: ,

SQL

Sorting integers in a string using XQuery

by ASH June 29, 2009 18:13

I stumbled across the task of sorting numbers present in a comma separated string, based on their numerical value in SQL.
Curious as always I thought it would be possible to solve such a problem using XML and XQuery in SQL and indeed it was.

Given a numerical string, then in SQL Server 2005 and up, you can do something like this:
DECLARE @X XML ''

DECLARE @STR VARCHAR(255CAST(@X.query(
'for $i in (11, 9, 10)
                  order by $i
                  return fn:concat(xs:string($i), ",")'
AS VARCHAR(255
))
                  
SELECT LEFT(@STRLEN(@STR) -1)
  
(Remember, in SQL 2005, you need to put assignment on another line, this is 2008 syntax)

The output from this query will be the value: '9, 10, 11'

Now, it might not be special or terrible useful as such – but the point of the post is also much more that untraditional methods for solving problems exists, and one can think outside the box and solve many problems by doing so.

A problem with this is that if you want the input created dynamically - you either need to build the entire segment within a string and use EXEC, or perhaps you can use the sql:column and sql:variable to help you along.

Tags: , ,

SQL | XML

Using INNER JOIN/SELF JOIN to allow for smaller indexes.

by ASH May 15, 2009 16:26

In databases, I often need foreign keys in my tables, because I’ll want to use them to select content out from my tables. However this can often result in either bad index utilization in the selection or making additional indexes based on the foreign key(s) and the content I need to select out.
This in turn can result in ending up with “many” indexes and sometimes many big indexes.

However a method can be to make smaller indexes and use an INNER JOIN to join into the table an extra time.

I’m going to show the pattern with a relative simple example to illustrate, because it is about the pattern more so then the specifics layout, content and size of the table.
It is just a pattern/technique to keep in mind and have in the SQL toolbox.
Suppose you have two tables of a similar pattern to this:

Tables for self join example
(click for larger size)

I’ve let the SQL Server create my clustered indexes based on the primary key, which means I have a clustered index over my primary key(s).
When having to look up rows in JoinOne based on the foreign key, it’ll often look like this:

SELECT 
FROM 
JoinOne T1
WHERE FKOne <VALUE>
  

Because I’ve added no other indexes to the tables, I’ll get an Index Scan or Table Scan when running the above query, which - as we - know is not a good thing most of the time.
This usually leads to the creating of a second index which indexes over my foreign key.
However if the table is large, and if I’ll need to extract many/most of the columns (like in my example with SELECT *), it can mean I’ll have to make either a complete index or one with many included columns, just ordered by my foreign key first.

If I make a small index with my foreign key and my primary key second, I can use this to join into my table again, and then after that use my primary key.
This is an index to illustrate:

Index for self join example 
(click for larger size)

Then I can make the following query:

SELECT T2.
FROM 
JoinOne T1
INNER JOIN JoinOne T2 
ON
 
T1.PKOne 
T2.PKOne
 
AND T1.PKTwo 
T2.PKTwo
 
AND T1.PKThree 
T2.PKThree
WHERE T1.FKOne <VALUE>
  

When looking at its execution plan, it’ll show two index seeks instead of my previous index scan. This way I have a small index as possible, but maintain seeks in my execution.
Of course there are some considerations one need to take with this pattern. Firstly – if the tables are “small”, then the scan in itself might be alright, or the overhead of keeping a complete index for the foreign key is neglible.

However the advantage of the pattern is that I can have smaller indexes on (very) large tables, which means less overhead when inserting/updating – but still have only index seeks in my selections execution plans.
It is a useful technique in my opinion – when used at the right times, which of course is on a case by case evaluation as always.  

Rollback the transaction

by ASH April 21, 2009 11:22

Last night I wrote a bit about an experience I had where the transaction blocked access.
So I decided to quickly write about how to rollback a transaction.
Suppose we have two tables:

Tables for Transaction example

I run the following SQL to illustrate wraping some INSERTs in a transaction.

BEGIN TRAN

BEGIN 
try

INSERT INTO TableOne VALUES ('aa')
INSERT INTO TableTwo  VALUES (SCOPE_IDENTITY(), 'bb')
INSERT INTO TableTwo  VALUES (-1'cc')
INSERT INTO TableOne VALUES ('aa')

END try
BEGIN catch
ROLLBACK TRAN
END 
catch

IF @@TRANCOUNT 
   
COMMIT TRAN

What happens here is the first insert will insert 'aa' into TableOne.
Then I use the identity value from TableOne and inserts into TableTwo in the subsequent insert. This will succeed. The third insert will try to insert a value -1 which will throw a foreign key restraint error.

Because I have wrapped in a transaction, and utilize the Try/Catch functionality (like in most other programming languages) – I can within rollback nicely within the "catch", and make sure nothing gets committed to the database.

The end of the query tests the transaction counter (@@TRANCOUNT) and whether or not the transaction should be committed. This way I avoid having open transactions, and I avoid trying to close a transaction which have already been handled. Each BEGIN TRAN(saction) will increment @@TRANCOUNT, whereas ROLLBACK and COMMIT will decrement it.

Transaction and connection

by ASH April 20, 2009 20:12

I had an experience at work today when I started playing around with transactions.

Basically – unless you play around with the isolation levels (a large topic on its own, for another time), if you open a transaction it’ll lock access to the table(s) you are modifying within said transaction.
Nothing strange about that.

However I experienced a transaction timing out when called from my data-access code layer. That meant the transaction was still “live” and blocked access to the table.
The lock could only be solved in one of 3 ways.
Either I had to manually kill the connection (for example via activity monitor) in SQL Server.
Or I had to wait until the application pool (aspnet process) got recycled (or I could kill it manually).
Or I had to wait until SQL Server killed the transaction (which in my test examples took around 5 minutes).

The lesson learned here is that one needs to watch the transactions and their running because you can end up with a long period of locking.

The way I tested this was I made an asp.net webform which called a Stored Procedure.
In this procedure I started a transaction, and then performed some actions which I knew would result in a time out when called from my webform. Even after the call from my webform had failed with a timeout, access to the affected table was still locked until I did as mentioned.

Transactions are a good thing, but you need to be aware of them. And remember to use rollback, cause otherwise transactions are pretty useless :)

Dynamic inequality checks

by ASH March 05, 2009 21:03

One of the problems I often encounter in SQL when reading forums, solving problems for myself and so on, are equality and inequality checks and the need to make them dynamic.
A rather nifty (in my view) mathematical trick can be used combining the checks > with < and >= with <= in one query without the use of IF or CASE.

Note that logically when A is greater than B, then –A will be less than –B, and that is the core of the logic which is behind this.

Suppose I have a table (MyTable) containing numerical values in a Value1 field, which I need to check against an arbitrary number – I’ll use 25 in my example. Then if I write the following code:

DECLARE @modifier INT = 1

SELECT *

FROM 
MyTable 
WHERE Value1 @modifier 25 @modifier
ORDER BY Value1 
DESC
(Note this is 2008 syntax, if using 2005 or ealier, just move the assignment of the modifier variable to another line)

In this case when the modifier is 1 the check will be Value1 > 25, but if I change the modifier value to -1, the logical check will now effectively be Value1 < 25. So just to recap - with a modifier of 1, I'd get all rows where Value1 is greater then 25, and with a modifier of -1 I would get the rows where Value1 is less then 25.

And if I needed to check for <= or >= all I’d have to do is apply a second check in an OR clause with a second modifier.

DECLARE @modifier INT = -1
DECLARE @modifier2 INT = 
1

SELECT *

FROM 
MyTable 
WHERE Value1 @modifyer 25 @modifyer
OR Value1 @modifier2 
25 
ORDER BY value1 DESC


Now this time around, if my @modifier2 is 1 the check now includes the equal check (so 25 will also be selected out), but if I change it to 0, the equal check is no longer present, and it’ll function as the earlier one without the second check.
So this means I can now check for > and < and >= and <= simply by changing @modifier between 1 and -1 and @modifier2 between 0 and 1. And by applying another modifier to value1 in the ORDER BY I could change between ASC and DESC on the fly as mentioned in my blog post about Dynamic ASC and DESC and Order by DateTime.

The main problem here is of course the difficulty in using possible indexes.
So if you have a large table it might hurt performance too much, and multiple stored procedures are the better approach.
But as always it is a case by case judgment call.  In my small testing for this piece it was about a 62/38 ratio meaning the mathematical dynamic approach took 62% and a specialized optimized query took 38%.
However I would need to maintain 4 different functions either via different queries, IF-constructs or CASE-constructs to get the same dynamic.
And of course the specialized and optimize will perform better, but at times the performance hit is negligible and at times a more dynamic approach is more useful.

Filter parameters in a Stored Procedure

by ASH March 02, 2009 20:19

Something I see often is difficulty in using “filters” in a stored procedure used to select – for example – products based on various different restrictions. Often I see people using dynamically written SQL in the code-layer and then sent to the SQL Server and EXEC.

However a different method of doing this is to utilize the logic of the OR functionality in the Stored Procedure.

Suppose you have a table containing products. These products have – for example – a weight column, height, length and other such physical attributes. Now suppose you want to select all products from this table, but also give the option to filtering the select further – say below a specific weight and/or below a specific length and so on.
This is doable the following way:

  DECLARE @Weight INT = NULL
  
DECLARE @Length INT = 
NULL
  
DECLARE @Width INT = 
NULL
  
  
SELECT 

  
FROM 
Product
  
WHERE (@Weight IS NULL OR (ProductWeight <= @Weight
))
  AND (
@Length IS NULL OR (ProductLength <= @Length
))
  AND (
@Width IS NULL OR (ProductWidth <= @Width))
  

Note this is 2008 syntax, if using 2000 or 2005 remember you can’t assign value to a declaration in the same line.

Also if using it in a Stored Procedure, the declarations are usually input parameters – but just default them to (for example) NULL.

When running this, with the null assignment, you’ll get all rows out, because all rows will fulfill the first part of the OR logic.

However if changing one, or more, of the variables to an actual value, the SQL will filter based on those values, because – as the variables no longer hold the value “null”, the second part of the OR statement comes into play.
So suppose you have the following values instead:

  DECLARE @Weight INT = 10
  
DECLARE @Length INT = 
15
  
DECLARE @Width INT = 20
  

The select will only select out those product rows which have a Weight less than or equal to 10, a Length equal to or less than 15 and likewise with Width and 20.
Omit one value and you’ll not filter for that one either, so 

  DECLARE @Weight INT = NULL
  
DECLARE @Length INT = 
10
  
DECLARE @Width INT = NULL
  

Will select all products which have a length less than or equal to 10 – regardless of Weight and Width.

Of course, me using <= is basically irrelevant, as any logical operation can be performed within the second part of the OR statement.

An inherit disadvantage to this type of procedure over the more specialized ones are the difficulty in utilizing indexes. Due to the OR nature of the procedure, you’ll be unable to utilize the indexes optimal, so it is naturally a concern which you must consider when using such approach.

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen | Modified by Mooglegiant

About me

My real name is Allan Svelmøe Hansen.

I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.  My primary fields of expertise is back end data integration, database design and optimization. But I also work with website development as well as application/services for server and SEO of websites.

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employer's view in any way, nor are my results guarentees for all situations.

Content is presented “as is”, with no warranty.