Visualising stored procedure call trees with SQLSpelunker
Whether debugging a problem in an existing system, or planning changes to an existing (complex) system, knowing how the call graph for a bit of SQL looks has been invaluable in quickly understanding the domain.
If you've worked in any system where a lot of the logic ends up in the database, you'll probably be nodding your head and remembering 'that' procedure which looked pretty simple to start with, and after following a few dependencies you were suddenly deep down the rabbit hole... (the longest I've looked at so far branches out into ~100 procedures).
Taking inspiration from What's in the box? Validating SQL Scripts with Powershell and Get Started with the ScriptDom I built SQLSpelunker to allow you to quickly go from a stored procedure name to a visual of the call tree for that procedure:
c:\SQLSpelunker>dotnet run "server=localhost;initial catalog=tempdb;integrated security=SSPI" "exec dbo.ProcOne;"
tempdb.dbo.ProcOne
-tempdb.dbo.ProcTwo
--tempdb.dbo.ProcThree
You can get started right away by grabbing the source from Github or download the latest build from the releases page. To build or run the code you'll need the .Net Core 2.0 SDK.
Read on for more details about what is currently supported, as well as details of how it works under the hood.
Features
Visualising simple call trees
Given the following schema:
use tempdb
go
create procedure dbo.ProcOne
as
exec dbo.ProcTwo;
go
create procedure dbo.ProcTwo
as
exec dbo.ProcThree;
go
create procedure dbo.ProcThree
as
select @@servername as ServerName;
go
Running SQLSpelunker against that database and the script exec dbo.ProcOne
will generate the following output:
tempdb.dbo.ProcOne
-tempdb.dbo.ProcTwo
--tempdb.dbo.ProcThree
ProcOne calls ProcTwo calls ProcThree. Fairly straightforward.
Infinite loop detection
While walking the procedure tree if a procedure is encountered that exists anywhere as a parent, that branch stops at the repeat call. This will catch both simple cases (ProcA calls ProcA) as well as multi-step infinite loops (ProcA calls ProcB calls ProcA).
If we now add one more procedure to our schema:
create procedure dbo.CallsItself
as
exec dbo.CallsItself;
exec dbo.ProcOne;
go
Running SQLSpelunker against the same database with the script exec dbo.NeverCallsItself
will return the following:
tempdb.dbo.CallsItself
-tempdb.dbo.CallsItself [*]
-tempdb.dbo.ProcOne
--tempdb.dbo.ProcTwo
---tempdb.dbo.ProcThree
The asterisk ([*]
) signifies the procedure has previously been called by a parent procedure, and SQLSpelunker will stop trying to walk the call tree.
Default schema
If a procedure is executed without a schema name then it will be assumed that the default schema of the calling user is dbo. So the following statement:
exec proc;
When parsed result in:
tempdb.dbo.proc
Note that the database name is defaulted based on the connection string, or in the case of cross-database calls the 'current' database the procedure is executing in. If you do not specify a database name on the connection string SQLSpelunker will throw an error.
Cross-database calls
The procedure calls can span multiple databases. Given the following new procedures:
use tempdb
go
create procedure dbo.CrossDBCall
as
exec master.dbo.ILiveInMaster;
go
use master
go
create procedure dbo.ILiveInMaster;
as
exec tempdb.dbo.CrossDBCall
go
We'd see the following call chain:
tempdb.dbo.CrossDBCall
-master.dbo.ILiveInMaster
--tempdb.dbo.CrossDBCall [*]
Limitations
Exec @procedure
Only stored procedures executed by identifier are supported. If your statement contains any SQL in the following form it won't be reported:
declare @procName = 'dbo.WontBeSeen';
exec @procName;
Every procedure is output
The current implementation will report every procedure called in the code, even if that procedure isn't reachable (e.g. hidden behind an always-false statement).
create procedure dbo.ThisMaybeDoesNothing
as
if 1 = 0
begin
/* Never reachable */
exec dbo.ProcOne;
end
if rand() < 0.1
begin
/* Sometimes reachable */
exec dbo.ProcTwo;
end
go
This produces:
tempdb.dbo.ThisMaybeDoesNothing
-dbo.ProcOne
-dbo.ProcTwo
Missing procedures are still reported
When looking up the definition of a procedure if the definition can't be retrieved the call will still be reported. This is probably a feature more than a limitation as some procs which do exist won't have their definition reported (a good example being msdb.dbo.sp_send_dbmail). In the future 'missing' procs may get an annotation.
Usage
The command line arguments for the SQLSpelunker console app are:
- A connection string to the initial database (which must contain a database)
- The SQL fragment that should be parsed and walked
An example connection string that connects to tempdb on the localhost instance with windows authentication is:
server=localhost;initial catalog=tempdb;integrated security=SSPI
How it works
The flow of the program looks something like this:
- Parse the SQL fragment and extract all stored procedure calls
- For each stored procedure call get the definition if it isn't already in cache
- For each stored procedure parse the definition and extract all stored procedure calls
- Repeat until there are no more unique calls (e.g. also break on infinite loops)
SQL fragments are parsed using TransactSql ScriptDom - the official docs are a bit bare-bones, so I'd suggest starting with Ed Elliot's excellent blog category if you want to learn more.
Stored procedure definitions are extracted using the object_definition function.
The future of SQLSpelunker
I was pleasantly surprised at how little code was needed to solve the core problem of getting the definition and parsing all procedure calls.
The console app is helpful but not particularly easy to use (a quick-start with colleagues often took the form of 'download .net core, clone the repo, and then run this command - no there isn't just an exe you can download...'). A web version or powershell cmdlet are ideas I'm playing with to make it more accessible.
The other feature which would be great is 'what calls this procedure?'. Obviously that is a very different challenge to walking a tree (you're forced to go an get every procedure - potentially from every database! When you've got thousands of procedures that starts to take time, and keeping it snappy/interactive was a goal, and one of the reasons I didn't just recommend some of the existing tools out there for exploring schema).
Moving away from procedures only I'd love to extend this idea to report what tables/columns are being modified, though the way that gets visualised/interacted with is what I'm struggling with right now (is the question 'show me everything' or 'show me all procs in this chain which touch table foo/column bar?').
The longest/deepest proc I've seen so far
Each line below represents a nested call to another proc all branching out from a single call. Curious to know what else is out there :)