Search This Blog

Thursday, February 2, 2017

Grouping Sets in T-SQL

whenever any aggregate function is required, the GROUP BY clause is the only solution
There has always been a requirement get these aggregate functions based on different sets of columns in the same result set. It is also safe to use this feature as this is an ISO standard.

Though the same result could be achieved earlier, we would have to write different queries and would have to combine them using a UNION operator. The result set returned by a GROUPING SET is the union of the aggregates based on the columns specified in each set in the Grouping Set.


As you can see from the code itself, you just specify the needed grouping sets inside the GROUP BY GROUPING SETS clause – everything else is performed transparently by SQL Server. The empty parentheses specify the so-called Empty Grouping Set, the aggregation across the whole table. When you also look at the output of STATISTICS IO, you can see that the table Sales.SalesOrderHeader was accessed only once! That’s a huge difference from the previous manual implementation that we have performed.

Within the execution plan SQL Server uses a Table Spool operator that stores the retrieved data temporarily in TempDb. The data from the work table created in TempDb is afterwards used in the second branch of the execution plan. Therefore the data isn’t rescanned for every group from the base table, which leads to a better performance of the whole execution plan.

When you look at the execution plan, you can also see that the query plan contains 3 Stream Aggregate operators (highlighted in red, blue, and green). These 3 operators are calculating the individual grouping sets:
  • The blue highlighted operator calculates the grouping set for CustomerIDSalesPersonID, and YEAR(OrderDate).
  • The red highlighted operator calculates the grouping set for SalesPersonID and YEAR(OrderDate). In addition it also calculates the grouping set across “everything”.
  • The green highlighted operator calculates the grouping set for CustomerID and YEAR(OrderDate).
The idea behind the 2 subsequent Stream Aggregate operators is to calculate so-called Super Aggregates – aggregations of aggregations.

No comments:

Post a Comment