Thursday, January 19, 2017

SQL-Server Set-Based Moving Averages with Break-Level Accumulator-Resets. No Cursors, Self-Joins, or Subselects used.

Microsoft SQL-Server Query / Procedure: Set-Based Moving Averages (without Cursors, self-joins, or sub-selects) — Example 2, with break-level average-resetting abilities

This blog builds on my prior entry where I provided the source code for a Microsoft SQL-Server Transact-SQL (T-SQL) query example (easily adapted to a stored procedure) that demonstrates an innovative way of efficiently calculating moving-averages, with variable window-widths / queue-depths, without using cursors or self-joins or any of the usual techniques.

NOTE: for a more in-depth discussion, read my prior blog which compares this approach to APPLY and OVER approaches as well. This code introduces the new feature of being able to reset the moving-average accumulator(s) value at your chosen break level(s).

This is purely a set-based method, and is very efficient — the only solution that I have found to be more efficient is the native implementation of windowing (OVER clause) functionality in Microsoft SQL-Server 2012, but that has limitations since the native approach does not support using variables to define the number of data-points (i.e., the "window width") to include in the moving-average value like my custom solution is capable of.

This example used the Microsoft AdventureWorks sample database tables and values from SQL-Server 2012 release.

SQL-Server Procedure / Query Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2017
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: Mike's SET-BASED Moving-Average Technique #2 (includes break-level resets of averages)
-- Demonstrate how, without any self-JOIN operation, without any CURSOR, without subselects, 
-- we can still calculate a moving average, including the ability to reset that moving average
-- at particular break-values, and easily alter the number of values included in average.
--
-- Queue-Depth (i.e., "moving average values-count") size changes have nearly no impact on
-- execution speed, meaning there is essentially no performance penalty for larger moving-avg
-- "windows" than small ones -- so, a 30-day moving average calculates at nearly same speed 
-- as a 120-day moving average due to efficient algorithm.
--
-- Excellent for financial-type applications, like moving-average stock-price calculations.
-- This example even "resets" the moving-average at a break-value, and can easily be extended
-- to do so at multi-column breaks, etc (see SET-BASED RUNNING TOTALS EXAMPLE 2 for technique).
--**********************************************************************************************
DBCC DROPCLEANBUFFERS  WITH NO_INFOMSGS  -- Clears the data cache
DBCC FREEPROCCACHE     WITH NO_INFOMSGS  -- Clears the procedure cache
GO

SET NOCOUNT ON

DECLARE @starttime DATETIME
SELECT  @starttime = GetDate()

--**********************************************************************************************
-- BEGIN: SET-BASED Moving-Average Technique
-- Demonstrate how, without any self-JOIN operation, without any CURSOR, without subselects, 
-- we can still calculate a moving average, including the ability to reset that moving average
-- at particular break-values.
-- Excellent for financial-type applications, like moving-average stock-price calculations.
-- This example even "resets" the moving-average at a break-value, and can easily be extended
-- to do so at multi-column breaks, etc (see SET-BASED RUNNING TOTALS EXAMPLE 2 for technique).
--**********************************************************************************************

--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @Results TABLE
(
  UniqueID                  INT IDENTITY NOT NULL PRIMARY KEY,
  AverageResetBreakColumn1  INT,
  OrderID                   INT,
  ValueBeingAveraged        MONEY,
  TotalSaleMovingAvg        MONEY
)


--Insert all values to be totaled, into our work table in the REQUIRED ORDER by which
--we will be calculating moving-average for.  In this example, we will look at moving
--average of Order SubTotals by OrderDate within each Teritory.
INSERT INTO @Results(
  AverageResetBreakColumn1,
  OrderID,
  ValueBeingAveraged)
SELECT
  Detail.ProductID,
  Detail.SalesOrderID,
  ISNULL(Detail.LineTotal, 0) --Handle NULL values
FROM 
  Sales.SalesOrderDetail    AS Detail
ORDER BY
  Detail.ProductID,
  Detail.SalesOrderID


--Whether we call it "moving window width", or "Queue Depth", or whatever, this indicates how
--many elements are to be included in our moving average.  
--E.g., a common moving average in finance situations could be a 30 day moving average, you
--would set "depth" to 30.  For this example, keep queue small for easy validation of calcs.
DECLARE @QueueDepth  INT
SET     @QueueDepth  = 2

--Space we'll use to store each value in our queue. 
--In this example, allow for up to 9 leading digits, the decimal, and 4 trailing digits each.
DECLARE  @SingleValueCharLength  INT
SET     @SingleValueCharLength   = 14

--Variable to accumulate our delimited string of values-per-break-level used to calc moving avg.
--This is, essentially, our values-queue.  Initialize it, so always predictable fixed length.
--New values (i.e., current-row value) are prepended to this String, with oldest (in queue)
--value appearing in the rightmost position of string... i.e., add new value to front of string
--and old values fall off the "end" (right side).
--
--NOTE: SET SIZE of this var to @QueueDepth * @SingleValueCharLength, OR leave as 8000 or MAX,
--keeping in mind that MAX will add slight performance penalty.
DECLARE @MovingSubtotalValuesString VARCHAR(8000)
SET     @MovingSubtotalValuesString = REPLICATE('0', @SingleValueCharLength * @QueueDepth)


--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @AverageResetBreakVal  INT
SET     @AverageResetBreakVal  = -999 



--We will need to track the moving (or "windowed") subtotal during the update process
DECLARE  @MovingSubtotal      MONEY
SET      @MovingSubtotal      = 0

DECLARE @RowNumberThisGroup   INT
SET     @RowNumberThisGroup   = 1

SELECT DateDiff(ms, @starttime, GetDate()) --Display elapsed Milliseconds 

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
--    See SET-BASED RUNNING SUBTOTALS and CONCATENTATION examples for background info, and 
--    the remainder of specific moving-average logic is described (intracode comments) below.
--**********************************************************************************************
UPDATE
    @Results
SET
  --Keep track of what row# within a break-grouping we are on
  @RowNumberThisGroup =
    CASE 
      WHEN @AverageResetBreakVal = AverageResetBreakColumn1 
        THEN @RowNumberThisGroup + 1
      ELSE 1 
    END,
  --If at break, reset moving-subtotal (first value in group is current value); otherwise we
  --add the most recent value (current row value) to be included in the subtotal, and then
  --subtract the last value falling outside of queue-depth -- this is the secret to getting 
  --the "moving subtotal window" to work, and the entire reason we need the values-queue!
  @MovingSubtotal =
    CASE 
      WHEN @AverageResetBreakVal = AverageResetBreakColumn1 
        THEN @MovingSubtotal + ValueBeingAveraged  --ADD NEW VALUE
          - CONVERT(MONEY, RIGHT(@MovingSubtotalValuesString, @SingleValueCharLength)) --POP A VALUE OFF THE QUEUE TO ADJUST SUBTOTAL DOWN BY (AS IT MOVES OUTSIDE OUR "WINDOW" OR QUEUE-DEPTH)
      ELSE ValueBeingAveraged
    END,
  --If at break, reset moving-values-list-string to contain current row value, with the rest
  --of our "queue" (this string) holding just zero/empty values.
  --Otherwise, we will be adding new value to front of the queue-string and dropping the 
  --last value from the end of the queue.
    @MovingSubtotalValuesString  = 
    CASE 
      WHEN @AverageResetBreakVal = AverageResetBreakColumn1 
        THEN
          LEFT(CONVERT(CHAR(14), ValueBeingAveraged, 2), @SingleValueCharLength) +    --ADD NEW VAL TO LEFT OF QUEUE-STRING
          LEFT(@MovingSubtotalValuesString,  @SingleValueCharLength * (@QueueDepth -1))  --DROP THE LAST RIGHTMOST VALUE
      ELSE
          CONVERT(CHAR(14), ValueBeingAveraged, 2) + REPLICATE('0', @SingleValueCharLength * (@QueueDepth -1))  --RESET AT BREAK! 
      END ,
  --If at break, reset moving-avg (first value in group is current value); otherwise if
  --less than our queue-depth into a group, moving average is our moving-subtotal divided by
  --how many rows into group we are; otherwise, simply divide moving subtotal by queue-depth
  TotalSaleMovingAvg  = 
    CASE 
      WHEN (@AverageResetBreakVal = AverageResetBreakColumn1) 
      AND  (@RowNumberThisGroup <= @QueueDepth)
        THEN @MovingSubtotal / @RowNumberThisGroup
      WHEN (@AverageResetBreakVal = AverageResetBreakColumn1) 
        THEN @MovingSubtotal / @QueueDepth
      ELSE ValueBeingAveraged  
    END,
  --And finally, keep track of whether we hit a new break-value.
    @AverageResetBreakVal= AverageResetBreakColumn1


--**********************************************************************************************
--Output the results, showing all rows to demonstrate the accumulation...
--**********************************************************************************************
SELECT * FROM @results

--**********************************************************************************************
-- END: SET-BASED Moving-Average Technique
--**********************************************************************************************
SELECT DateDiff(ms, @starttime, GetDate()) --Display elapsed Milliseconds


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: