Using a Numbers Table in SQL Server to insert test data (T-SQL Tuesday #65)
T-SQL Tuesday #65 is hosted by Mike Donnelly (@SQLMD). There is no specific topic to write about this month, Mike simply wants us to learn something new and then write a blog post to teach it to others. I want to share something that I only recently learned, something I wish I had known about years ago, something that became part of my toolbox as soon as I discovered it: the Numbers Table. It is so simple and solves so many problems that everyone should know about it 🤓
A Numbers Table (perhaps most known as a Tally Table, sometimes called an Auxiliary Table of Numbers and even referred to as the Swiss Army Knife of SQL Server) is a one-column helper table that contains the numbers 1, 2, 3, 4, 5 and so on all the way up to the-highest-number-you-could-possibly-need.
It can be used to replace slower loops and row-by-row operations with faster set-based operations, generate dates, split strings, find gaps in data sets, expand data sets, insert test data and probably hundreds of other things. There are so many great and detailed articles already published about this topic, so I will stick to the T-SQL Tuesday topic of “Teach Something New” and share the two most recent things I learned: different ways to create a numbers table, and different ways to quickly insert test data by using a numbers table.
Create a Numbers Table
My preferred way of creating a numbers table is to use Itzik Ben-Gan’s solution described in the Virtual Auxiliary Table of Numbers article. The solution uses CROSS JOINs to generate rows and then a ROW_NUMBER() to create the actual sequence of numbers.
You can use Itzik Ben-Gan’s solution to generate a virtual numbers table when needed, you can create a function that returns a numbers table with a certain number of rows, or you can create an actual numbers table that live in your database.
By creating the function or table in your model database, they will automatically be created in new databases for you. Handy, right? 😃
/* --------------------------------------------------
-- Virtual Numbers Table
-------------------------------------------------- */
WITH
L0 AS (SELECT 1 AS n UNION ALL SELECT 1), -- 2 rows
L1 AS (SELECT 1 AS n FROM L0 AS a CROSS JOIN L0 AS b), -- 4 rows (2 x 2)
L2 AS (SELECT 1 AS n FROM L1 AS a CROSS JOIN L1 AS b), -- 16 rows (4 x 4)
L3 AS (SELECT 1 AS n FROM L2 AS a CROSS JOIN L2 AS b), -- 256 rows (16 x 16)
L4 AS (SELECT 1 AS n FROM L3 AS a CROSS JOIN L3 AS b), -- 65 536 rows (256 x 256)
L5 AS (SELECT 1 AS n FROM L4 AS a CROSS JOIN L4 AS b), -- 4 294 967 296 rows (65 536 x 65 536)
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (1000) n FROM Nums ORDER BY n;
/* --------------------------------------------------
-- Numbers Table
-------------------------------------------------- */
USE model;
GO
IF OBJECT_ID('dbo.Numbers') IS NOT NULL
DROP TABLE dbo.Numbers;
GO
CREATE TABLE Numbers (
n BIGINT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (n) WITH FILLFACTOR = 100
);
GO
WITH
L0 AS (SELECT 1 AS n UNION ALL SELECT 1),
L1 AS (SELECT 1 AS n FROM L0 AS a CROSS JOIN L0 AS b),
L2 AS (SELECT 1 AS n FROM L1 AS a CROSS JOIN L1 AS b),
L3 AS (SELECT 1 AS n FROM L2 AS a CROSS JOIN L2 AS b),
L4 AS (SELECT 1 AS n FROM L3 AS a CROSS JOIN L3 AS b),
L5 AS (SELECT 1 AS n FROM L4 AS a CROSS JOIN L4 AS b),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.Numbers (n)
SELECT TOP (100000) n FROM Nums ORDER BY n; /* Insert as many numbers as you need */
GO
SELECT n FROM dbo.Numbers WHERE n <= 1000;
GO
/* --------------------------------------------------
-- Numbers Table Function
-------------------------------------------------- */
USE model;
GO
IF OBJECT_ID('dbo.GetNums') IS NOT NULL
DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE AS RETURN
WITH
L0 AS (SELECT 1 AS n UNION ALL SELECT 1),
L1 AS (SELECT 1 AS n FROM L0 AS a CROSS JOIN L0 AS b),
L2 AS (SELECT 1 AS n FROM L1 AS a CROSS JOIN L1 AS b),
L3 AS (SELECT 1 AS n FROM L2 AS a CROSS JOIN L2 AS b),
L4 AS (SELECT 1 AS n FROM L3 AS a CROSS JOIN L3 AS b),
L5 AS (SELECT 1 AS n FROM L4 AS a CROSS JOIN L4 AS b),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (@n) n FROM Nums ORDER BY n;
GO
SELECT n FROM dbo.GetNums(1000);
GO
Insert Test Data
I learned about the numbers table while researching table partitioning. All I needed was to insert a specified number of rows in each partition so I could test things like partition switching, splitting and merging. I didn’t want my tests to depend on existing data.
Writing every single insert statement was not an option. I first tried GO [count] to insert a row a specified number of times. The syntax is easy to remember and it works ok on a small amount of data:
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2012-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2013-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2014-01-01', 10)
GO 1000
INSERT INTO Sales (SalesDate, Quantity) VALUES ('2015-01-01', 10)
GO 1000
However, GO [count] works like a WHILE loop, it executes a batch a specified number of times. The more times you execute a batch, the longer it takes. On average, inserting 1000 rows took 0-1 seconds. Inserting 10000 rows took 3-4 seconds. Inserting 100000 rows took 30-35 seconds. And inserting 1 million rows took 5.5 minutes!
After some research I discovered the numbers table, created the Numbers Table Function, and rewrote my queries:
INSERT INTO Sales (SalesDate, Quantity) SELECT '2012-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2013-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2014-01-01', 10 FROM dbo.GetNums(1000);
INSERT INTO Sales (SalesDate, Quantity) SELECT '2015-01-01', 10 FROM dbo.GetNums(1000);
What a difference! It was blazing fast compared to using GO [count]. On average, inserting 1000 rows took 2 milliseconds. Inserting 10000 rows took 25 milliseconds. Inserting 100000 rows took 140 milliseconds. And inserting 1 million rows took 1.5 seconds instead of 5.5 minutes!
This worked great until I wanted to insert 1000 rows per day instead of year. I didn’t have to write every single insert statement, but I still had to write an insert statement for every single day. That was not an option, so I rewrote my queries yet again. This time I used the same CROSS JOIN logic as Itzik Ben-Gan used in his numbers table solution. I first created rows for all the days and then cross joined to get 1000 rows for each day:
/* --------------------------------------------------
-- Insert from Virtual Numbers Table
-------------------------------------------------- */
WITH
L0 AS (SELECT 1 AS n UNION ALL SELECT 1),
L1 AS (SELECT 1 AS n FROM L0 AS a CROSS JOIN L0 AS b),
L2 AS (SELECT 1 AS n FROM L1 AS a CROSS JOIN L1 AS b),
L3 AS (SELECT 1 AS n FROM L2 AS a CROSS JOIN L2 AS b),
L4 AS (SELECT 1 AS n FROM L3 AS a CROSS JOIN L3 AS b),
L5 AS (SELECT 1 AS n FROM L4 AS a CROSS JOIN L4 AS b),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5)
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM Nums AS dates
CROSS JOIN Nums AS qty
WHERE dates.n <= DATEDIFF(DD,'2012-01-01','2016-01-01')
AND qty.n <= 1000;
GO
/* --------------------------------------------------
-- Insert from Numbers Table
-------------------------------------------------- */
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM dbo.Numbers dates
CROSS JOIN dbo.Numbers AS qty
WHERE dates.n <= DATEDIFF(DD,'2012-01-01','2016-01-01')
AND qty.n <= 1000;
/* --------------------------------------------------
-- Insert from Numbers Table Function
-------------------------------------------------- */
INSERT INTO dbo.Sales (SalesDate, Quantity)
SELECT DATEADD(DAY,dates.n-1,'2012-01-01') AS SalesDate, qty.n AS Quantity
FROM dbo.GetNums(DATEDIFF(DD,'2012-01-01','2016-01-01')) dates
CROSS JOIN dbo.GetNums(1000) AS qty;
Each of these queries only take a couple of seconds. My preferred way of inserting test data is the last query that uses the Numbers Table Function. It is short and the syntax is easy to remember.
Summary
Create a Numbers Table and / or a Numbers Table Function in the model database so they are instantly created in all new databases. Use CROSS JOIN logic to write short and powerful queries to insert test data. Finally, inserting data is just one use case, have fun and play around with these things to see if you can come up with completely new solutions to solve old problems!
And as a bonus, if you thought this article was interesting, you should absolutely watch Itzik Ben-Gan’s Boost your T-SQL with the APPLY Operator session from SQLBits XIV. He explains how to use APPLY instead of JOIN for more advanced use cases, and I promise some really cool demos. Thank you so much, Itzik! 👏🏻
About the Author
Cathrine Wilhelmsen is a Microsoft Data Platform MVP, international speaker, author, blogger, organizer, and chronic volunteer. She loves data and coding, as well as teaching and sharing knowledge - oh, and sci-fi, gaming, coffee and chocolate 🤓