How to extract daily stats as a weekly calendar using T-SQL
tsql sql server

How to extract daily stats as a weekly calendar using T-SQL

When you are a bit into SQL, logging and statistics, you may want to extract some statistics from your logs. While a group by easily outputs a list of stats per date, when looking at a single indicator, you may be more confortable to see stats grouped per week to isolate seasonality of the week. This articles shows how to output such a table from a log with SQL Server.

The other day, I had to rebuild a new procedure to extract views of a certain type of page on my website, and understand its evolution. It is actually a process that can be done quite quickly once we understand how it works.

In this article we are going to see how to extract using T-SQL a rolling-day-week stat table from a long list of log, which presents a structure similar to:

D W0 W1 W2 W3 W4 W5 W6 W7 W8
0 0 11552 16187 16669 16650 16479 16557 16620 16552
1 0 17993 16849 16837 16694 16546 16537 16569 16555
2 0 15599 16348 16776 16582 16573 16527 16629 16597
3 0 16554 16163 16686 16510 16709 16396 16604 16804
4 0 16492 16045 16699 16643 16673 16488 16577 16520
5 0 16183 16447 16113 16558 16948 16000 16657 16519
6 4303 16124 16910 16892 16505 16544 16652 12051 16498

Why this structure? You may have noticed that websites often have a weekly seasonality, meaning that it may for example have more visitors on Saturday than on the other days, and have a low number of customer on specific days of the week. Having a table that structures the data to show a week-day per line allows to better understand the evolution of a specific day of the week. You may compare this to quater-analysis for annual reports (Q1 to Q1, S1 to S1, etc). Here we will be able to understand or plot series for Monday or any day of the week.

We have two options to present the output:

  • present full weeks based on a static day list (Sunday to Saturday, or Monday to Sunday), we would base this on DATEPART(weekday, date) and DATEPART(week, date), or
  • present full weeks based on the last complete day (if we are Wednesday, the last full day will be Tuesday), we would base this on DATEDIFF(day, date, dbo.date_of(getutcdate())), which will give us the number of days from today, on which we apply v.Diff % 7 to get the day of our rolling week and v.Diff / 7 to get the week.

This article will go for the second option, implementing the first one just requires to change the first calculation.

Let's start with a log table, on which we would like to extract the number of lines per day (you may add filters to match your use-cases). We will start from a table [Log_Views] having a structure similar to this one:

Id uniqueidentifier NOT NULL DEFAULT newid(),
CreationTime datetime2 NOT NULL DEFAULT getutcdate(),
Page nvarchar(max) NULL,
Ip varchar(max) NOT NULL
Id CreationTime Page Ip
AB3E096D-[...]-1F3B16B5971A 2019-07-15 12:26:04 / 192.168.0.2
05123687-[...]-4349EB442145 2019-07-15 12:25:36 /Page1 192.168.0.3
3B5EC9EF-[...]-9AAB26964FBC 2019-07-15 12:21:20 / 192.168.0.4
41921A30-[...]-82B9B9314D93 2019-07-15 12:14:40 /Page2 192.168.0.5

First we will need to extract the date from time data, to be able to group lines per date. Let's use the following simple function:

ALTER FUNCTION [dbo].[DATE_OF](@date datetime)  
RETURNS datetime
AS
BEGIN
	return DATEFROMPARTS(DATEPART(year,@date),DATEPART(month,@date),DATEPART(day,@date))
END

Now we can group pages per date, with a simple request SELECT dbo.DATE_OF(CreationTime) as [Date], Page, COUNT(*) AS Nb FROM [Log_Views] GROUP BY [Date] ORDER BY [Date] DESC, that would output the number of views per page per date

Next step is to extract the day number and the week number. The following request is the first shot that groups the total number of views by date.

SELECT dbo.DATE_OF(CreationTime) as [Date]
		, COUNT(*) AS Nb 
	FROM [Log_Views] 
	GROUP BY [Date]

which would output a structure like this:

Date Nb
2019-07-15 4303
2019-07-14 11552
2019-07-13 17993
2019-07-12 15599
2019-07-11 16554
2019-07-10 16492

We would like something similar, but instead of getting a date, we would like something that will allow us to get the day of week and the week. As we are going to build a rolling table, the easiest would be to just base on the number of days from today, from which we will extract the day and week.

SELECT  DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) as DaysFromToday
		, COUNT(*) AS Nb 
	FROM [Log_Views] 
	GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))

Output:

DaysFromToday Nb
0 4303
1 11552
2 17993
3 15599
4 16554
5 16492

And then we can extract the day and week. In order to treat the data, and keep a readable code, we will use CTE to chain the requests:

; WITH ViewsPerDay AS (
	SELECT 	DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.date_of(getutcdate())) as DaysFromToday
			, COUNT(*) AS Nb 
		FROM [Log_Views] 
		GROUP BY dbo.DATE_OF(CreationTime)
)
SELECT DaysFromToday
		, DaysFromToday / 7 as Week
		, DaysFromToday % 7 as Day
		, Nb
	FROM ViewsPerDay

Which outputs:

DaysFromToday Week Day Nb
0 0 0 4303
1 0 1 11552
2 0 2 17993
3 0 3 15599
4 0 4 16554
5 0 5 16492
6 0 6 16183
7 1 0 16124
8 1 1 15187
9 1 2 17849

To make the display easier, and avoid to have a missing row if one is empty, we will use a table of days:

WDays AS (
  SELECT 0 as W 
	UNION ALL SELECT 1
	UNION ALL SELECT 2
	UNION ALL SELECT 3
	UNION ALL SELECT 4
	UNION ALL SELECT 5
	UNION ALL SELECT 6
)

Now we would like to transpose the data for past weeks, to do so, we will use a JOIN. For simplicity we will just use a static number of 8 past weeks. Adding weeks just requires one more line, when transposing data requires much more complex tricks.

SELECT 
		-- Day of the week
		wd.w,
		-- Aggregation to select the relevant figure
		SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END),
		SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END)
	FROM WDays wd
	INNER JOIN ViewsPerDayWithWeek v ON v.Day = ww.w 
	GROUP BY ww.w
	ORDER BY ww.w

If you run this request, you will notice that the first day is the current day, and is probably not full, so the figure will not be consistent with the rest of the data. We can slightly trick to isolate this partial day by adding 6 to the DATEDIFF, so the current day will have a dedicated column. Which gives at the end:

;WITH ViewsPerDay AS (
	-- Extract the views per relative days from today
	SELECT  DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) + 6 as DaysFromToday
			, COUNT(*) AS Nb 
		FROM [Log_Views] 
		WHERE CreationTime > DATEADD(day, -8*7-3, GetUtcDate())
		GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))
), ViewsPerDayWithWeek AS (
	-- Extract the day of the relative week, and the week number
	SELECT DaysFromToday, Nb
			, DaysFromToday % 7 as Day
			, DaysFromToday / 7 as Week
		FROM ViewsPerDay
), WDays AS (
	-- Static table for week days to prefix the data
  SELECT 0 as W 
	UNION ALL SELECT 1
	UNION ALL SELECT 2
	UNION ALL SELECT 3
	UNION ALL SELECT 4
	UNION ALL SELECT 5
	UNION ALL SELECT 6
)
-- Final output, 8 weeks wide
SELECT 
		wd.w AS D,
		SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
		SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
		SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
		SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
		SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
		SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
		SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
		SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
		SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
	FROM WDays wd
	INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w 
	GROUP BY wd.w
	ORDER BY wd.w

And we can add a little bonus to output the total views per week:

;WITH ViewsPerDay AS (
	-- Extract the views per relative days from today
	SELECT  DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) + 6 as DaysFromToday
			, COUNT(*) AS Nb 
		FROM [Log_Views] 
		WHERE CreationTime > DATEADD(day, -8*7-3, GetUtcDate())
		GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))
), ViewsPerDayWithWeek AS (
	-- Extract the day of the relative week, and the week number
	SELECT DaysFromToday, Nb
			, DaysFromToday % 7 as Day
			, DaysFromToday / 7 as Week
		FROM ViewsPerDay
), WDays AS (
	-- Static table for week days to prefix the data
  SELECT 0 as W 
	UNION ALL SELECT 1
	UNION ALL SELECT 2
	UNION ALL SELECT 3
	UNION ALL SELECT 4
	UNION ALL SELECT 5
	UNION ALL SELECT 6
)
-- Final output, 8 weeks wide
SELECT 
		'All' AS D,
		SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
		SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
		SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
		SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
		SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
		SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
		SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
		SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
		SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
	FROM WDays wd
	INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w 
UNION ALL
SELECT 
		'D '+CAST(wd.w as varchar(5)) as D,
		SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
		SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
		SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
		SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
		SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
		SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
		SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
		SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
		SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
	FROM WDays wd
	INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w 
	GROUP BY wd.w
	ORDER BY D

The 'D' prefix for days is to allow the order-by to work properly.

And we get an output structured like:

D W0 W1 W2 W3 W4 W5 W6 W7 W8
All 4303 110497 114949 116672 116142 116472 115157 111707 116045
D 0 0 11552 16187 16669 16650 16479 16557 16620 16552
D 1 0 17993 16849 16837 16694 16546 16537 16569 16555
D 2 0 15599 16348 16776 16582 16573 16527 16629 16597
D 3 0 16554 16163 16686 16510 16709 16396 16604 16804
D 4 0 16492 16045 16699 16643 16673 16488 16577 16520
D 5 0 16183 16447 16113 16558 16948 16000 16657 16519
D 6 4303 16124 16910 16892 16505 16544 16652 12051 16498

Of course this request may require some pre-treatments filters to exclude duplicates, such as a distinct on the couple date/ip to count only one view per user per day, filter on a single-page, or anything that is relevant to your use-case.


Send
Please sign-in to comment
.X0001-01-01_00-00