Visualising stored procedure call trees with SQLSpelunker

Whether debugging a problem in an existing system, or planning changes to an existing (complex) system, knowing how the call graph for a bit of SQL looks has been invaluable in quickly understanding the domain.

If you've worked in any system where a lot of the logic ends up in the database, you'll probably be nodding your head and remembering 'that' procedure which looked pretty simple to start with, and after following a few dependencies you were suddenly deep down the rabbit hole... (the longest I've looked at so far branches out into ~100 procedures).

Taking inspiration from What's in the box? Validating SQL Scripts with Powershell and Get Started with the ScriptDom I built SQLSpelunker to allow you to quickly go from a stored procedure name to a visual of the call tree for that procedure:

c:\SQLSpelunker>dotnet run "server=localhost;initial catalog=tempdb;integrated security=SSPI" "exec dbo.ProcOne;"

tempdb.dbo.ProcOne
-tempdb.dbo.ProcTwo
--tempdb.dbo.ProcThree

You can get started right away by grabbing the source from Github or download the latest build from the releases page. To build or run the code you'll need the .Net Core 2.0 SDK.

Read on for more details about what is currently supported, as well as details of how it works under the hood.

Features

Visualising simple call trees

Given the following schema:

use tempdb
go
create procedure dbo.ProcOne
as
exec dbo.ProcTwo;
go
create procedure dbo.ProcTwo
as
exec dbo.ProcThree;
go
create procedure dbo.ProcThree
as
select @@servername as ServerName;
go

Running SQLSpelunker against that database and the script exec dbo.ProcOne will generate the following output:

tempdb.dbo.ProcOne
-tempdb.dbo.ProcTwo
--tempdb.dbo.ProcThree

ProcOne calls ProcTwo calls ProcThree. Fairly straightforward.

Infinite loop detection

While walking the procedure tree if a procedure is encountered that exists anywhere as a parent, that branch stops at the repeat call. This will catch both simple cases (ProcA calls ProcA) as well as multi-step infinite loops (ProcA calls ProcB calls ProcA).

If we now add one more procedure to our schema:

create procedure dbo.CallsItself
as
exec dbo.CallsItself;
exec dbo.ProcOne;
go

Running SQLSpelunker against the same database with the script exec dbo.NeverCallsItself will return the following:

tempdb.dbo.CallsItself
-tempdb.dbo.CallsItself [*]
-tempdb.dbo.ProcOne
--tempdb.dbo.ProcTwo
---tempdb.dbo.ProcThree

The asterisk ([*]) signifies the procedure has previously been called by a parent procedure, and SQLSpelunker will stop trying to walk the call tree.

Default schema

If a procedure is executed without a schema name then it will be assumed that the default schema of the calling user is dbo. So the following statement:

exec proc;

When parsed result in:

tempdb.dbo.proc

Note that the database name is defaulted based on the connection string, or in the case of cross-database calls the 'current' database the procedure is executing in. If you do not specify a database name on the connection string SQLSpelunker will throw an error.

Cross-database calls

The procedure calls can span multiple databases. Given the following new procedures:

use tempdb
go
create procedure dbo.CrossDBCall
as
exec master.dbo.ILiveInMaster;
go
use master
go
create procedure dbo.ILiveInMaster;
as
exec tempdb.dbo.CrossDBCall
go

We'd see the following call chain:

tempdb.dbo.CrossDBCall
-master.dbo.ILiveInMaster
--tempdb.dbo.CrossDBCall [*]

Limitations

Exec @procedure

Only stored procedures executed by identifier are supported. If your statement contains any SQL in the following form it won't be reported:

declare @procName = 'dbo.WontBeSeen';
exec @procName;

Every procedure is output

The current implementation will report every procedure called in the code, even if that procedure isn't reachable (e.g. hidden behind an always-false statement).

create procedure dbo.ThisMaybeDoesNothing
as
if 1 = 0
begin
    /* Never reachable */
    exec dbo.ProcOne;
end

if rand() < 0.1
begin
    /* Sometimes reachable */
    exec dbo.ProcTwo;
end
go

This produces:

tempdb.dbo.ThisMaybeDoesNothing
-dbo.ProcOne
-dbo.ProcTwo

Missing procedures are still reported

When looking up the definition of a procedure if the definition can't be retrieved the call will still be reported. This is probably a feature more than a limitation as some procs which do exist won't have their definition reported (a good example being msdb.dbo.sp_send_dbmail). In the future 'missing' procs may get an annotation.

Usage

The command line arguments for the SQLSpelunker console app are:

A connection string to the initial database (which must contain a database)
The SQL fragment that should be parsed and walked

An example connection string that connects to tempdb on the localhost instance with windows authentication is:

server=localhost;initial catalog=tempdb;integrated security=SSPI

How it works

The flow of the program looks something like this:

Parse the SQL fragment and extract all stored procedure calls
For each stored procedure call get the definition if it isn't already in cache
For each stored procedure parse the definition and extract all stored procedure calls
Repeat until there are no more unique calls (e.g. also break on infinite loops)

SQL fragments are parsed using TransactSql ScriptDom - the official docs are a bit bare-bones, so I'd suggest starting with Ed Elliot's excellent blog category if you want to learn more.

Stored procedure definitions are extracted using the object_definition function.

The future of SQLSpelunker

I was pleasantly surprised at how little code was needed to solve the core problem of getting the definition and parsing all procedure calls.

The console app is helpful but not particularly easy to use (a quick-start with colleagues often took the form of 'download .net core, clone the repo, and then run this command - no there isn't just an exe you can download...'). A web version or powershell cmdlet are ideas I'm playing with to make it more accessible.

The other feature which would be great is 'what calls this procedure?'. Obviously that is a very different challenge to walking a tree (you're forced to go an get every procedure - potentially from every database! When you've got thousands of procedures that starts to take time, and keeping it snappy/interactive was a goal, and one of the reasons I didn't just recommend some of the existing tools out there for exploring schema).

Moving away from procedures only I'd love to extend this idea to report what tables/columns are being modified, though the way that gets visualised/interacted with is what I'm struggling with right now (is the question 'show me everything' or 'show me all procs in this chain which touch table foo/column bar?').

The longest/deepest proc I've seen so far

Each line below represents a nested call to another proc all branching out from a single call. Curious to know what else is out there :)

Namespace configuration