In my last post http://jahaines.blogspot.com/2009/10/io-stats-what-are-you-missing.html, I talked about the performance problems associated with scalar user defined functions and how SSMS may report invalid IO statistics, for scalar UDFs. In this post, I will focusing on how to transform those pesky scalar UDFs into more scalable function. When developing user defined functions, you have to keep a few things in mind. Firstly, scalar UDFs are evaluated for each row returned by the query. Additionally, SQL Server is not able to maintain statistics and optimize any function, except an inline table valued function. Lastly, most code logic does not necessarily need to be encapsulated, in a function. You may get better performance if you choose to use a derived table instead of a function; however, the biggest problem with a derived table is it cant be encapsulated and reused across an application. An inline TVF is really useful if you need to encapsulate business logic and reuse it throughout an application. Let’s start by creating the sample table DDL.
IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
DROP TABLE dbo.[TestData];
--Create Sample Table
CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
INSERT INTO dbo.TestData
SELECT TOP 1000
ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
ABS(CHECKSUM(NEWID()))%250 AS SomeId,
DATEADD(DAY,ABS(CHECKSUM(NEWID()))%1000,'2008-01-01') AS SomeDate
Next I am going to create two functions. One function will be a scalar UDF and the other will be an inline table valued function. If you do not know what an inline TVF is, an inline TVF is like a parameterized view and is subject to the same restrictions as a view. For more information you can read the following post in BOL, http://msdn.microsoft.com/en-us/library/ms189294.aspx.
--Create Scalar Function
CREATE FUNCTION dbo.fn_SomeFunction(@SomeId INT)
DECLARE @dt DATETIME
SELECT @dt = MAX(SomeDate)
WHERE SomeId = @SomeId
--Create Inline Table Valued Function
CREATE FUNCTION dbo.fn_SomeInlineFunction()
SELECT SomeId, MAX(SomeDate) AS SomeDate
GROUP BY SomeId
All of our DDL is in place. All that is left is to test the performance. If you read my last post, you should be expecting the inline TVF to out perform the scalar UDF. Let’s see what actually transpires. Discard the query results to the grid by clicking Tools –> Options –> Query Results –> SQL Server –> Results To Grid –> Discard results after execution. Next open SQL Server profiler and use the standard template. Run the code below to capture the performance counters in profiler.
INNER JOIN dbo.fn_SomeInlineFunction() AS max_dt
ON max_dt.SomeId = TestData.SomeId
The results of the queries should look similar to my results below. The things of note are the reads and the CPU required to satisfy the each query. The number of reads and CPU required to satisfy the scalar UDF is astronomically greater than the inline TVF.
If the above screenshot is not enough to discourage you from using scalar UDFs, I do not know what can. The point being that there are all kinds of great alternatives to encapsulating code logic, without the use of scalar functions. Inline TVFs offer a SET based approach for encapsulating business logic; plus the optimizer is able to use existing statistics and indexes to optimize inline TVFs. It is my recommendation that you should try to convert all scalar UDFs to inline TVFs. I know this is not always possible, but it is a good start. I typically try to stay away from scalar and multi-line UDFs, unless absolutely necessary. I hope that you have learned something new and that you can use this example to get the needed signoff to change those problematic scalar UDFs, into inline TVFs.
Until next time, happy coding.