NOTES | Grouping and Windowing

SQL Server can perform some data analysis operations. A data analysis function is a function applied to a set of rows, and it returns a single value. An example of such a function is the SUM aggregate function. A data analysis function can be either a group function or a window function. The two types differ in how you define the set of rows for the function to operate on. You can use grouped queries to define grouped tables, and then a group function is applied to each group. Or, you can use windowed queries that define windowed tables, and then a window function is applied to each window.

Writing Grouped Queries

You can use grouped queries to define groups in your data, and then you can perform data analysis computations per group. You group the data by a set of attributes known as a grouping set. Traditional T-SQL queries define a single grouping set; namely, they group the data in only one way. More recently, T-SQL introduced support for features that enable you to define multiple grouping sets in one query.

Working with a Single Grouping Set

With grouped queries, you can arrange the rows you’re querying in groups and apply data analysis computations like aggregate functions against those groups. A query becomes a grouped query when you use a group function, a GROUP BY clause, or both. A query that invokes a group function but doesn’t have an explicit GROUP BY clause arranges all rows in one group.

Working with Multiple Grouping Sets

Using T-SQL, you can define multiple grouping sets in the same query which just means you can group the data in more than one way. T-SQL supports three clauses that allow defined multiple sets: GROUPING SETS, CUBE, and ROLLUP. You use these in the GROUP BY clause.

You can use the GROUPING SETS clause to list all grouping sets that you want to define in the query.

The GROUPING SETS option gives you the ability to combine multiple GROUP BY clauses into one GROUP BY clause. The results are the equivalent of UNION ALL of the specified groups.

Pivoting and Unpivoting Data

You can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And  PIVOT runs aggregations where they’re required on any remaining column values that are wanted in the final output. UNPIVOT carries out the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.

Pivoting Data

Pivoting is a technique that groups and aggregates data, transitioning it from a state of rows to a state of columns. In all pivot queries, you need to identify three elements:

  • What do you want to see on rows? This element is known as the on rows, or grouping element.
  • What do you want to see on columns? This element is known as the on cols, or spreading element.
  • What do you want to see in the intersection of each distinct row and column value? This element is known as the data, or aggregation element.

Unpivoting Data

Unpivoting data can be considered the inverse of pivoting. The starting point is some pivoted data. When unpivoting data, you rotate the input data from a state of columns to a state of rows. Just like T-SQL supports the native PIVOT table operator to perform pivoting, it supports a native UNPIVOT operator to perform unpivoting. Like PIVOT, UNPIVOT is implemented as a table operator that you use in the FROM clause. The operator operates on the input table that is provided to its left, which could be the result of other table operators, like joins. The outcome of the UNPIVOT operator is a table result that can be used as the input to other table operators that appear to its right.

Notice that UNPIVOT isn’t the exact reverse of PIVOT. PIVOT carries out an aggregation and merges possible multiple rows into a single row in the output. UNPIVOT doesn’t reproduce the original table-valued expression result because rows have been merged. Also, null values in the input of  UNPIVOT disappear in the output. When the values disappear, it shows that there may have been original null values in the input before the PIVOT operation.

Using Window Functions

Window functions enable you to perform data analysis computations. With window functions, you define the set of rows per function—and then return one result value per each underlying row and function. You define the set of rows for the function to work with using a clause called OVER.

Types of Window functions

  • Aggregate Window Functions
    SUM(), MAX(), MIN(), AVG(). COUNT()
  • Ranking Window Functions
    RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
  • Value Window Functions
    LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

Syntax

window_function ( [ ALL ] expression ) OVER ( [ PARTITION BY partition_list ] [ ORDER BY order_list] )  

Window Aggregate Functions

Window aggregate functions are the same as the group aggregate functions (for example, SUM, COUNT, AVG, MIN, and MAX), except window aggregate functions are applied to a window of rows defined by the OVER clause.

One of the benefits of using window functions is that unlike grouped queries, windowed queries do not hide the detail—they return a row for every underlying query’s row. This means that you can mix detail and aggregated elements in the same query, and even in the same expression. Using the OVER clause, you define a set of rows for the function to work with per each underlying row. In other words, a windowed query defines a window of rows per each function and row in the underlying query.

Window Ranking Functions

With window ranking functions, you can rank rows within a partition based on specified ordering. As with the other window functions, if you don’t indicate a window partition clause, the entire underlying query result is considered one partition. The window order clause is mandatory. Window ranking functions do not support a window frame clause.

T-SQL supports four window ranking functions: ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().

The most common use of RANKING functions is to find the top (N) records based on a certain value. For example, Top 10 highest paid employees, Top 10 ranked students, Top 50 largest orders etc.

The ROW_NUMBER function computes a unique sequential integer starting with 1 within the window partition based on the window ordering.

RANK and DENSE_RANK differ from ROW_NUMBER in the sense that they assign the same ranking value to all rows that share the same ordering value.

The RANK function returns the number of rows in the partition that have a lower ordering value than the current, plus 1.

The DENSE_RANK function returns the number of distinct ordering values that are lower than the current, plus 1.

The NTILE function, you can arrange the rows within the partition in a requested number of equally sized tiles, based on the specified ordering.


Exam Tip
Window functions are allowed only in the SELECT and ORDER BY clauses of the query. If you need to refer to those in other clauses—for example, in the WHERE clause—you need to use a table expression such as a CTE. You invoke the window function in the inner query’s SELECT clause, assigning the expression with a column alias. Then you refer to that column alias in the outer query’s WHERE clause. You have a chance to practice this technique in this lesson’s exercises.


Window Offset Functions

Window offset functions return an element from a single row that is in a given offset from the current row in the window partition, or from the first or last row in the window frame.

T-SQL supports the following window offset functions: LAG, LEAD, FIRST_VALUE, and LAST_VALUE.

LAG() and LEAD()

The LAG function allows to access data from the previous row in the same result set without use of any SQL joins. The LAG and LEAD functions rely on an offset with respect to the current row, and the FIRST_ VALUE and LAST_VALUE functions operate on the first or last row in the frame, respectively.

The LAG and LEAD functions support window partition and ordering clauses. They don’t support a window frame clause. The LAG function returns an element from the row in the current partition that is a requested number of rows before the current row (based on the window ordering), with 1 assumed as the default offset. The LEAD function returns an element from the row that is in the requested offset after the current row.

FIRST_VALUE() and LAST_VALUE()

These functions help you to identify first and last record within a partition or entire table if PARTITION BY is not specified. Naturally, the functions support window partition, order, and frame clauses.