Formatting T-SQL Code with SQLFluff

Every Data Warehouse consists of lots of SQL code. While previously all business transformations have been handled in SSIS, in recent times I have often seen more and more logic handled in Stored Procedures. Stored procedures have the benefit that they run onPrem and on Azure and are cost efficient for medium complex ETL work loads on Azure.

However, writing SQL and extending SQL code over years with several persons, it is hard to keep the code consistent.

Several tools exists to remediate this problem.

I want something that I can use in all projects and thus it is of huge benefit, if it is not another software to be procured. Moreover, I would like something that is under active development. T-SQL is evolving and thus the parser needs to be kept up to date to handle new functionality such as temporal tables, dynamic data masking, etc. As a result, I have decided for SQLFluff.

SQLFluff intro

SQLFluff is written in Python it is a small command line application that can do several things:

  • lint -- give suggestions how to improve SQL code
  • fix -- fix formatting issues in the code. This is a lot more useful especially, if one wants to format hundreds of files
  • parse -- show how SQLFluff interprets the SQL statement, very useful for debugging behavior

One gets a great introduction and understanding of SQLFluff by listening to the interview of Alan Cruickshank on the Python Podcast.

SQLFluff has the benefit that the community is super active. I have been writing several bug reports in the last weeks and all of them have been fixed within just a few days (GitHub Details). SQLFluff is not limited to T-SQL but speaks 18 different dialects (e.g. DB2, Oracle, Redshift, Snowflake and many more)

SQLFluff fixes

I have used SQLFluff to format about 100.000 lines of code.

Let me share some of the results here.

Long lines are nicely wrapped: Long lines are nicely wrapped

Lines are nicely indented: Lines are nicely indented

Keywords are nicely capitalized: Keywords are nicely capitalized

In a CASE statement SQLFluff even realizes that code can be written in a more concise way: CASE statement rewritten

SQLFluff configuration

I suggest the following config file for SQLFluff (.sqlfluff) which works well for tsql.

[sqlfluff]
# L014 do not strip optional quotes arround column names
# L034 do not reorder columns
# L030 keep casing of e.g. TVF
# L031 we want to use alias in Joins for sure
# L042 keep subqueries hopefully the writer had a good reason
exclude_rules = L014,L034,L030,L031,L042
dialect = tsql
templater = raw

[SQLFluff:rules]
tab_space_size = 2
max_line_length = 200
comma_style = leading

[SQLFluff:rules:L059]
prefer_quoted_identifiers = True

[SQLFluff:rules:L010]
capitalisation_policy = upper

[SQLFluff:rules:L011]
aliasing = implicit

Get started

For installing SQLFluff please follow the official instructions. To get started with SQLFluff consider the following example:

WITH Contracts_CTE (CtrCustomerID, NumberOfContracts)
AS
(
SELECT CUSTOMER_ID, COUNT(*)
FROM contracts
GROUP BY CUSTOMER_ID
)
SelecT AVG(NumberOfContracts) AS [Average]
from Contracts_CTE;

Then run the following command:

PS C:\Users\fmoeller\ > sqlfluff fix --config .\.sqlfluff .\query.sql

The result will be:

WITH Contracts_CTE (CtrCustomerID, NumberOfContracts)
AS (
    SELECT
        CUSTOMER_ID
        , COUNT(*)
    FROM contracts
    GROUP BY CUSTOMER_ID
)

SELECT AVG(NumberOfContracts) AS [Average]
FROM Contracts_CTE;

Conclusion

I am planning to continue contributing to SQLFluff and use that to keep SQL queries nicely formatted. Reading formatted source code and having capitalized keywords is just a more pleasant experience.

There are still some issues with SQLFluff:

  • one needs multiple iteration to fix everything
  • it is very slow - it takes 60 minutes for my example project

Looking forward to see SQLFluff mature and more and more people using it with T-SQL. Enjoy formatting!

Previous Post