How to extract daily stats as a weekly calendar using T-SQL
The other day, I had to rebuild a new procedure to extract views of a certain type of page on my website, and understand its evolution. It is actually a process that can be done quite quickly once we understand how it works.
In this article we are going to see how to extract using T-SQL a rolling-day-week stat table from a long list of log, which presents a structure similar to:
D | W0 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 11552 | 16187 | 16669 | 16650 | 16479 | 16557 | 16620 | 16552 |
1 | 0 | 17993 | 16849 | 16837 | 16694 | 16546 | 16537 | 16569 | 16555 |
2 | 0 | 15599 | 16348 | 16776 | 16582 | 16573 | 16527 | 16629 | 16597 |
3 | 0 | 16554 | 16163 | 16686 | 16510 | 16709 | 16396 | 16604 | 16804 |
4 | 0 | 16492 | 16045 | 16699 | 16643 | 16673 | 16488 | 16577 | 16520 |
5 | 0 | 16183 | 16447 | 16113 | 16558 | 16948 | 16000 | 16657 | 16519 |
6 | 4303 | 16124 | 16910 | 16892 | 16505 | 16544 | 16652 | 12051 | 16498 |
Why this structure? You may have noticed that websites often have a weekly seasonality, meaning that it may for example have more visitors on Saturday than on the other days, and have a low number of customer on specific days of the week. Having a table that structures the data to show a week-day per line allows to better understand the evolution of a specific day of the week. You may compare this to quater-analysis for annual reports (Q1 to Q1, S1 to S1, etc). Here we will be able to understand or plot series for Monday or any day of the week.
We have two options to present the output:
- present full weeks based on a static day list (Sunday to Saturday, or Monday to Sunday),
we would base this on
DATEPART(weekday, date)
andDATEPART(week, date)
, or - present full weeks based on the last complete day (if we are Wednesday, the last full day will be Tuesday),
we would base this on
DATEDIFF(day, date, dbo.date_of(getutcdate()))
, which will give us the number of days from today, on which we applyv.Diff % 7
to get the day of our rolling week andv.Diff / 7
to get the week.
This article will go for the second option, implementing the first one just requires to change the first calculation.
Let's start with a log table, on which we would like to extract the number of lines per day (you may add filters to match your use-cases). We will start from a table [Log_Views] having a structure similar to this one:
Id uniqueidentifier NOT NULL DEFAULT newid(),
CreationTime datetime2 NOT NULL DEFAULT getutcdate(),
Page nvarchar(max) NULL,
Ip varchar(max) NOT NULL
Id | CreationTime | Page | Ip |
---|---|---|---|
AB3E096D-[...]-1F3B16B5971A | 2019-07-15 12:26:04 | / | 192.168.0.2 |
05123687-[...]-4349EB442145 | 2019-07-15 12:25:36 | /Page1 | 192.168.0.3 |
3B5EC9EF-[...]-9AAB26964FBC | 2019-07-15 12:21:20 | / | 192.168.0.4 |
41921A30-[...]-82B9B9314D93 | 2019-07-15 12:14:40 | /Page2 | 192.168.0.5 |
First we will need to extract the date from time data, to be able to group lines per date. Let's use the following simple function:
ALTER FUNCTION [dbo].[DATE_OF](@date datetime)
RETURNS datetime
AS
BEGIN
return DATEFROMPARTS(DATEPART(year,@date),DATEPART(month,@date),DATEPART(day,@date))
END
Now we can group pages per date, with a simple request SELECT dbo.DATE_OF(CreationTime) as [Date], Page, COUNT(*) AS Nb FROM [Log_Views] GROUP BY [Date] ORDER BY [Date] DESC
, that would output the number of views per page per date
Next step is to extract the day number and the week number. The following request is the first shot that groups the total number of views by date.
SELECT dbo.DATE_OF(CreationTime) as [Date]
, COUNT(*) AS Nb
FROM [Log_Views]
GROUP BY [Date]
which would output a structure like this:
Date | Nb |
---|---|
2019-07-15 | 4303 |
2019-07-14 | 11552 |
2019-07-13 | 17993 |
2019-07-12 | 15599 |
2019-07-11 | 16554 |
2019-07-10 | 16492 |
We would like something similar, but instead of getting a date, we would like something that will allow us to get the day of week and the week. As we are going to build a rolling table, the easiest would be to just base on the number of days from today, from which we will extract the day and week.
SELECT DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) as DaysFromToday
, COUNT(*) AS Nb
FROM [Log_Views]
GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))
Output:
DaysFromToday | Nb |
---|---|
0 | 4303 |
1 | 11552 |
2 | 17993 |
3 | 15599 |
4 | 16554 |
5 | 16492 |
And then we can extract the day and week. In order to treat the data, and keep a readable code, we will use CTE to chain the requests:
; WITH ViewsPerDay AS (
SELECT DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.date_of(getutcdate())) as DaysFromToday
, COUNT(*) AS Nb
FROM [Log_Views]
GROUP BY dbo.DATE_OF(CreationTime)
)
SELECT DaysFromToday
, DaysFromToday / 7 as Week
, DaysFromToday % 7 as Day
, Nb
FROM ViewsPerDay
Which outputs:
DaysFromToday | Week | Day | Nb |
---|---|---|---|
0 | 0 | 0 | 4303 |
1 | 0 | 1 | 11552 |
2 | 0 | 2 | 17993 |
3 | 0 | 3 | 15599 |
4 | 0 | 4 | 16554 |
5 | 0 | 5 | 16492 |
6 | 0 | 6 | 16183 |
7 | 1 | 0 | 16124 |
8 | 1 | 1 | 15187 |
9 | 1 | 2 | 17849 |
To make the display easier, and avoid to have a missing row if one is empty, we will use a table of days:
WDays AS (
SELECT 0 as W
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
Now we would like to transpose the data for past weeks, to do so, we will use a JOIN
. For simplicity we will just use a static number of 8 past weeks. Adding weeks just requires one more line, when transposing data requires much more complex tricks.
SELECT
-- Day of the week
wd.w,
-- Aggregation to select the relevant figure
SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END),
SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END)
FROM WDays wd
INNER JOIN ViewsPerDayWithWeek v ON v.Day = ww.w
GROUP BY ww.w
ORDER BY ww.w
If you run this request, you will notice that the first day is the current day, and is probably not full, so the figure will not be consistent with the rest of the data. We can slightly trick to isolate this partial day by adding 6 to the DATEDIFF, so the current day will have a dedicated column. Which gives at the end:
;WITH ViewsPerDay AS (
-- Extract the views per relative days from today
SELECT DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) + 6 as DaysFromToday
, COUNT(*) AS Nb
FROM [Log_Views]
WHERE CreationTime > DATEADD(day, -8*7-3, GetUtcDate())
GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))
), ViewsPerDayWithWeek AS (
-- Extract the day of the relative week, and the week number
SELECT DaysFromToday, Nb
, DaysFromToday % 7 as Day
, DaysFromToday / 7 as Week
FROM ViewsPerDay
), WDays AS (
-- Static table for week days to prefix the data
SELECT 0 as W
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
-- Final output, 8 weeks wide
SELECT
wd.w AS D,
SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
FROM WDays wd
INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w
GROUP BY wd.w
ORDER BY wd.w
And we can add a little bonus to output the total views per week:
;WITH ViewsPerDay AS (
-- Extract the views per relative days from today
SELECT DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate())) + 6 as DaysFromToday
, COUNT(*) AS Nb
FROM [Log_Views]
WHERE CreationTime > DATEADD(day, -8*7-3, GetUtcDate())
GROUP BY DATEDIFF(day, dbo.DATE_OF(CreationTime), dbo.DATE_OF(GetUtcDate()))
), ViewsPerDayWithWeek AS (
-- Extract the day of the relative week, and the week number
SELECT DaysFromToday, Nb
, DaysFromToday % 7 as Day
, DaysFromToday / 7 as Week
FROM ViewsPerDay
), WDays AS (
-- Static table for week days to prefix the data
SELECT 0 as W
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
-- Final output, 8 weeks wide
SELECT
'All' AS D,
SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
FROM WDays wd
INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w
UNION ALL
SELECT
'D '+CAST(wd.w as varchar(5)) as D,
SUM(CASE WHEN Week = 0 THEN v.Nb ELSE 0 END) AS W0,
SUM(CASE WHEN Week = 1 THEN v.Nb ELSE 0 END) AS W1,
SUM(CASE WHEN Week = 2 THEN v.Nb ELSE 0 END) AS W2,
SUM(CASE WHEN Week = 3 THEN v.Nb ELSE 0 END) AS W3,
SUM(CASE WHEN Week = 4 THEN v.Nb ELSE 0 END) AS W4,
SUM(CASE WHEN Week = 5 THEN v.Nb ELSE 0 END) AS W5,
SUM(CASE WHEN Week = 6 THEN v.Nb ELSE 0 END) AS W6,
SUM(CASE WHEN Week = 7 THEN v.Nb ELSE 0 END) AS W7,
SUM(CASE WHEN Week = 8 THEN v.Nb ELSE 0 END) AS W8
FROM WDays wd
INNER JOIN ViewsPerDayWithWeek v ON v.Day = wd.w
GROUP BY wd.w
ORDER BY D
The 'D' prefix for days is to allow the order-by to work properly.
And we get an output structured like:
D | W0 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 |
---|---|---|---|---|---|---|---|---|---|
All | 4303 | 110497 | 114949 | 116672 | 116142 | 116472 | 115157 | 111707 | 116045 |
D 0 | 0 | 11552 | 16187 | 16669 | 16650 | 16479 | 16557 | 16620 | 16552 |
D 1 | 0 | 17993 | 16849 | 16837 | 16694 | 16546 | 16537 | 16569 | 16555 |
D 2 | 0 | 15599 | 16348 | 16776 | 16582 | 16573 | 16527 | 16629 | 16597 |
D 3 | 0 | 16554 | 16163 | 16686 | 16510 | 16709 | 16396 | 16604 | 16804 |
D 4 | 0 | 16492 | 16045 | 16699 | 16643 | 16673 | 16488 | 16577 | 16520 |
D 5 | 0 | 16183 | 16447 | 16113 | 16558 | 16948 | 16000 | 16657 | 16519 |
D 6 | 4303 | 16124 | 16910 | 16892 | 16505 | 16544 | 16652 | 12051 | 16498 |
Of course this request may require some pre-treatments filters to exclude duplicates, such as a distinct on the couple date/ip to count only one view per user per day, filter on a single-page, or anything that is relevant to your use-case.