NOTES | Getting Started with the SELECT Statement

NOTES | Getting Started with the SELECT Statement

Using the FROM and SELECT Clauses

The FROM and SELECT clauses are two principal clauses that appear in almost every query that retrieves data. When you execute a SELECT statement, it usually retrieves one or more rows of information – the exact number depends on how you construct the statement. These rows are collectively known as a result set.

Brief summary of the clauses in a SELECT statement:

The FROM Clause

FROM – This is the second most important clause in the SELECT statement and is also required. You use the FROM clause to specify the tables or views from which to draw the columns you’re listed in the DSELECT clause. According to logical query processing, the FROM clause is the first clause to be evaluated logically in a SELECT query.

The FROM clause has two main roles:

  • It’s the clause where you indicate the tables that you want to query.
  • It’s the clause where you can apply table operators like joins to input tables.

The SELECT Clause

SELECT – This is the primary clause of the SELECT statement and is absolutely required. You use it to specify the columns you want in the result set of your query. The columns themselves are drawn from the table or view you specify in the FROM clause.

The SELECT clause of a query has two main roles:

  • It evaluates expressions that define the attributes in the query’s result, assigning them with aliases if needed.
  • Using a DISTINCT clause, you can eliminate duplicate rows in the result if needed.

WHERE – This is an optional clause that you use to filter the rows returned by the FROM clause. The WHERE keyword is followed by an expression, technically known as a predicate, that evaluates to true, false, or unknown. You can test the expression by using operators.

GROUP BY – When you use aggregate functions in the SELECT clause to produce summary information, you use the GROUP BY clause to divide the information into distinct groups. Your database system uses any column or list of columns following the GROUP BY keywords as grouping columns.

HAVING – The HAVING clause filters the result of aggregate functions in grouped information. It is like the WHERE clause in that the HAVING keyword is followed by an expression that evaluates to true, false, or unknown. You can test the expression by using standard comparison operators, Boolean operators, or special operators. HAVING is also an optional clause.

Delimiting Identifiers

When referring to identifiers of attributes, schemas, tables, and other objects, there are cases in which you are required to use delimiters vs. cases in which the use of delimiters is optional.

T-SQL supports both a standard form to delimit identifiers using double quotation marks, as in “Sales”.”Orders”, as well as a proprietary form using square brackets, as in [Sales].[Orders].

When the identifier is “regular,” delimiting it is optional. In a regular identifier, the identifier complies with the rules for formatting identifiers. The rules say that the first character must be a letter in the range A through Z (lower or uppercase), underscore (_), at sign (@), or number sign (#). Subsequent characters can include letters, decimal numbers, at sign, dollar sign ($), number sign, or underscore. The identifier cannot be a reserved keyword in T-SQL, cannot have embedded spaces, and must not include supplementary characters.

An identifier that doesn’t comply with these rules must be delimited.

Working with Data Types and Built-in Functions

When defining columns in tables, parameters in procedures and functions, and variables in T-SQL batches, you need to choose a data type. The data type constrains the data that is supported, in addition to encapsulating behavior that operates on the data, exposing it through operators and other means.

Choosing the Appropriate Data Type

Choosing the most appropriate data type for attributes is one of the most important data decisions and with so many Data Type options, it may seem like a difficult task. However, as long as you follow certain principles, you can be smart about it.

SQL Server supports many data types from different categories: exact numeric (INT, NUMERIC), character strings (CHAR, VARCHAR), Unicode character strings (NCHAR, NVARCHAR), approximate numeric (FLOAT, REAL), binary strings (BINARY, VARBINARY), date and time (DATE, TIME, DATETIME2, SMALLDATETIME, DATETIME, DATETIMEOFFSET), and others.

Data Types

Also, DO NOT confuse the formatting of a value with its type. Sometimes, people use character strings to store dates because they think of storing a date in a certain format. The formatting of a value is supposed to be the responsibility of the application when data is presented. The type is a property of the value stored in the database, and the internal storage format shouldn’t be your concern. This aspect has to do with an important principle in the relational model called physical data independence.

If you chose an inappropriate type for your value, you sometimes must convert the type (explicitly or implicitly), and sometimes massage the value in order to treat the value as it was supposed to be. This is because a data type encapsulates behavior for the value. Using the wrong data type creates missed behavior(s) from operators and functions that support the correct data type. For example, for data types representing character strings, the plus (+) operator represents concatenation. However, for data types representing numbers, it represents addition.

Data size should also affect the appropriate data type. Query performance is affected by the I/O involved and a query that reads less tends to run faster. *SEE CHART ABOVE for data sizes.

Another important aspect in choosing a type has to do with choosing fixed types (CHAR, NCHAR, BINARY) vs. dynamic ones (VARCHAR, NVARCHAR, VARBINARY). Fixed types use the storage for the indicated size; for example, CHAR(30) uses storage for 30 characters, whether you actually specify 30 characters or less. This means that updates will not require the row to physically expand, and therefore no data shifting is required. So for attributes that get updated frequently, where the update performance is a priority, you should consider fixed types. Note that when compression is used—specifically row compression—SQL Server stores fixed types like variable ones, but with less overhead.

Your choices of data types for your attributes will have a dramatic effect on the functionality and performance of the T-SQL code that interacts with the data—even more so for attributes used as keys.

Date and Time Functions

T-SQL supports a number of date and time functions that allow you to manipulate your date and time data.

Current Date and Time

One important category of functions is the category that returns the current date and time. The functions in this category are GETDATE, CURRENT_TIMESTAMP, GETUTCDATE, SYSDATETIME, SYSUTCDATETIME, and SYSDATETIMEOFFSET.

  • GETDATE is T-SQL–specific, returning the current date and time in the SQL Server instance you’re connected to as a DATETIME data type.
  • CURRENT_TIMESTAMP is the same, only it’s standard, and hence the recommended one to use.
  • SYSDATETIME and SYSDATETIMEOFFSET are similar, only returning the values as the more precise DATETIME2 and DATETIMEOFFSET types (including offset), respectively.

Note: there are no built-in functions to return the current date or the current time; to get such information, simply cast the SYSDATETIME function to DATE or TIME, respectively

Date and Time Parts

Using the DATEPART function, you can extract from an input date and time value a desired part, such as a year, minute, or nanosecond, and return the extracted part as an integer. For example, the expression DATEPART(month, ‘20120212’) returns 2. T-SQL provides the functions YEAR, MONTH, and DAY as abbreviations to DATEPART, not requiring you to specify the part. The DATENAME function is similar to DATEPART, only it returns the name of the part as a character string, as opposed to the integer value.

Add and Diff

DATEADD is a very commonly used function. With it, you can add a requested number of units of a specified part to a specified date and time value.

DATEDIFF is another commonly used function; it returns the difference in terms of a requested
part between two date and time values.

Offset

T-SQL supports two functions related to date and time values with an offset: SWITCHOFFSET and TODATETIMEOFFSET.

With the SWITCHOFFSET function, you can return an input DATETIMEOFFSET value in a requested offset term.

The TODATETIMEOFFSET function is used for a different purpose. You use it to construct a DATETIMEOFFSET value from two inputs: the first is a date and time value that is not offsetaware, and the second is the offset. You can use this function when migrating from data that is not offset-aware, where you keep the local date and time value in one attribute, and the offset in another, to offset-aware data.

Character Functions

T-SQL was not really designed to support very sophisticated character string manipulation functions, so you won’t find a very large set of such functions.

Concatenation

Character string concatenation is a very common need. T-SQL supports two ways to concatenate strings—one with the plus (+) operator, and another with the CONCAT function.

Substring Extraction and Position

Using the SUBSTRING function, you can extract a substring from a string given as the first argument, starting with the position given as the second argument, and a length given as the third argument.

The LEFT and RIGHT functions extract a requested number of characters from the left and right ends of the input string, respectively.

The CHARINDEX function returns the position of the first occurrence of the string provided as the first argument within the string provided as the second argument.

You can combine, or nest, functions in the same expression. For example, suppose you query a table with an attribute called fullname formatted as ‘ ‘, and you need to write an expression that extracts the first name part.

T-SQL also supports a function called PATINDEX that, like CHARINDEX, you can use to locate the first position of a string within another string. But whereas with CHARINDEX you’re looking for a constant string, with PATINDEX you’re looking for a pattern. The pattern is formed very similar to the LIKE patterns that you’re probably familiar with, where you use
wildcards like % for any string, _ for a single character, and square brackets ([]) representing a single character from a certain list or range.

String Length

T-SQL provides two functions that you can use to measure the length of an input value— LEN and DATALENGTH.
The LEN function returns the length of an input string in terms of the number of characters.
Note that it returns the number of characters, not bytes, whether the input is a regular character or Unicode character string.

The DATALENGTH function returns the length of the input in terms of number of bytes. This means, for example, that if the input is a Unicode character string, it will count 2 bytes per character.

String Alteration

T-SQL supports a number of functions that you can use to apply alterations to an input string. Those are REPLACE, REPLICATE, and STUFF.
With the REPLACE function, you can replace in an input string provided as the first argument all occurrences of the string provided as the second argument, with the string provided as the third argument.

The REPLICATE function allows you to replicate an input string a requested number of times.

The STUFF function operates on an input string provided as the first argument; then, from the character position indicated as the second argument, deletes the number of characters indicated by the third argument. Then it inserts in that position the string specified as the fourth argument.

String Formatting

SQL Server uses functions to apply formatting options to an input string. These are the UPPER, LOWER, LTRIM, RTRIM, and FORMAT functions.
The first four functions are self-explanatory (uppercase form of the input, lowercase form of the input, input after removal of leading spaces, and input after removal of trailing spaces).
Note that there’s no TRIM function that removes both leading and trailing spaces; to achieve this, you need to nest one function call within another, as in RTRIM(LTRIM()).
With the FORMAT function, you can format an input value based on a format string, and optionally specify the culture as a third input where relevant.

CASE Expression and Related Functions

T-SQL supports an expression called CASE and a number of related functions that you can use to apply conditional logic to determine the returned value. Many people incorrectly refer to CASE as a statement. A statement performs some kind of an action or controls the flow of the code, and that’s not what CASE does; CASE returns a value, and hence is an expression.

The CASE expression has two forms—the simple form and the searched form.

The simple form compares an input expression (in this case the attribute discontinued) to multiple possible scalar when expressions (in this case, 0 and 1), and returns the result expression (in this case, ‘No’ and ‘Yes’, respectively) associated with the first match. If there’s no match and an ELSE clause is specified, the else expression (in this case, ‘Unknown’) is returned.

If there’s no ELSE clause, the default is ELSE NULL.

The searched form of the CASE expression is more flexible. Instead of comparing an input expression to multiple possible expressions, it uses predicates in the WHEN clauses, and the first predicate that evaluates to true determines which when expression is returned. If none is true, the CASE expression returns the else expression.

T-SQL supports a number of functions that can be considered as abbreviates of the CASE expression. Those are the standard COALESCE and NULLIF functions, and the nonstandard ISNULL, IIF, and CHOOSE.

The COALESCE function accepts a list of expressions as input and returns the first that is not NULL, or NULL if all are NULLs.

A typical use of COALESCE is to substitute a NULL with something else. For example, the expression COALESCE(region, ”) returns region if it’s not NULL and returns an empty string if it is NULL.


T-SQL supports a nonstandard function called ISNULL that is similar to the standard COALESCE, but it’s a bit more limited in the sense that it supports only two inputs. Like COALESCE, it returns the first input that is not NULL. So, instead of COALESCE(region, ”), you could use ISNULL(region, ”). Generally, it is recommended to stick to standard features unless there’s some flexibility or performance advantage in the nonstandard feature that is a higher priority. ISNULL is actually more limited than COALESCE, so generally, it is recommended to stick to COALESCE.

There are a couple of subtle differences between COALESCE and ISNULL that you might be interested in. One difference is in which input determines the type of the output.


Exam Tip
COALESCE and ISNULL can impact performance when you are combining sets; for example, with joins or when you are filtering data. Consider an example where you have two tables T1 and T2 and you need to join them based on a match between T1.col1 and T2.col1. The attributes do allow NULLs. Normally, a comparison between two NULLs yields unknown,
and this causes the row to be discarded. You want to treat two NULLs as equal. What some do in such a case is use COALESCE or ISNULL to substitute a NULL with a value that they know cannot appear in the data. For example, if the attributes are integers, and you know that you have only positive integers in your data (you can even have constraints that ensure this), you might try to use the predicate COALESCE(T1.col1, -1) = COALESCE(T2. col1, -1), or ISNULL(T1.col1, -1) = ISNULL(T2.col1, -1). The problem with this form is that, because you apply manipulation to the attributes you’re comparing, SQL Server will not rely on index ordering. This can result in not using available indexes efficiently. Instead, it is recommended to use the longer form: T1.col1 = T2.col1 OR (T1.col1 IS NULL AND T2.col1 IS NULL), which SQL Server understands as just a comparison that considers NULLs as equal. With this form, SQL Server can efficiently use indexing.


T-SQL also supports the standard NULLIF function. This function accepts two input expressions, returns NULL if they are equal, and returns the first input if they are not. For example, consider the expression NULLIF(col1, col2). If col1 is equal to col2, the function returns a NULL; otherwise, it returns the col1 value.
As for IIF and CHOOSE, these are nonstandard T-SQL functions that were added to simplify migrations from Microsoft Access platforms. Because these functions aren’t standard and there are simple standard alternatives with CASE expressions, it is not usually recommended that you use them. However, when you are migrating from Access to SQL Server, these functions can help with smoother migration, and then gradually you can refactor your code to use the available standard functions. With the IIF function, you can return one value if an input predicate is true and another value otherwise.

Leave a Reply