Every Data Warehouse consists of lots of SQL code. While previously all business transformations have been handled in SSIS, in recent times I have often seen more and more logic handled in Stored Procedures. Stored procedures have the benefit that they run onPrem and on Azure and are cost efficient for medium complex ETL work loads on Azure.
However, writing SQL and extending SQL code over years with several persons, it is hard to keep the code consistent.
Several tools exists to remediate this problem.
I want something that I can use in all projects and thus it is of huge benefit, if it is not another software to be procured. Moreover, I would like something that is under active development. T-SQL is evolving and thus the parser needs to be kept up to date to handle new functionality such as temporal tables, dynamic data masking, etc. As a result, I have decided for SQLFluff.
SQLFluff is written in Python it is a small command line application that can do several things:
One gets a great introduction and understanding of SQLFluff by listening to the interview of Alan Cruickshank on the Python Podcast.
SQLFluff has the benefit that the community is super active. I have been writing several bug reports in the last weeks and all of them have been fixed within just a few days (GitHub Details). SQLFluff is not limited to T-SQL but speaks 18 different dialects (e.g. DB2, Oracle, Redshift, Snowflake and many more)
I have used SQLFluff to format about 100.000 lines of code.
Let me share some of the results here.
Long lines are nicely wrapped:
Lines are nicely indented:
Keywords are nicely capitalized:
In a CASE statement SQLFluff even realizes that code can be written in a more concise way:
I suggest the following config file for SQLFluff (.sqlfluff
) which works well for tsql.
[sqlfluff]
# L014 do not strip optional quotes arround column names
# L034 do not reorder columns
# L030 keep casing of e.g. TVF
# L031 we want to use alias in Joins for sure
# L042 keep subqueries hopefully the writer had a good reason
exclude_rules = L014,L034,L030,L031,L042
dialect = tsql
templater = raw
[SQLFluff:rules]
tab_space_size = 2
max_line_length = 200
comma_style = leading
[SQLFluff:rules:L059]
prefer_quoted_identifiers = True
[SQLFluff:rules:L010]
capitalisation_policy = upper
[SQLFluff:rules:L011]
aliasing = implicit
For installing SQLFluff please follow the official instructions. To get started with SQLFluff consider the following example:
WITH Contracts_CTE (CtrCustomerID, NumberOfContracts)
AS
(
SELECT CUSTOMER_ID, COUNT(*)
FROM contracts
GROUP BY CUSTOMER_ID
)
SelecT AVG(NumberOfContracts) AS [Average]
from Contracts_CTE;
Then run the following command:
PS C:\Users\fmoeller\ > sqlfluff fix --config .\.sqlfluff .\query.sql
The result will be:
WITH Contracts_CTE (CtrCustomerID, NumberOfContracts)
AS (
SELECT
CUSTOMER_ID
, COUNT(*)
FROM contracts
GROUP BY CUSTOMER_ID
)
SELECT AVG(NumberOfContracts) AS [Average]
FROM Contracts_CTE;
I am planning to continue contributing to SQLFluff and use that to keep SQL queries nicely formatted. Reading formatted source code and having capitalized keywords is just a more pleasant experience.
There are still some issues with SQLFluff:
Looking forward to see SQLFluff mature and more and more people using it with T-SQL. Enjoy formatting!