🚫 Ad Blocker Detected

Please disable your AD blocker to continue using this site. Ads help us keep the content free! please press keyboard F5 to refresh page after disabled AD blocker

請關閉廣告攔截器以繼續使用本網站。廣告有助於我們保證內容免費。謝謝! 關閉後請按 F5 刷新頁面

0%

JOIN範圍條件Index優化

前文

JOIN條件範圍時,執行計畫預估值容易不準確,這也間接導致查詢效能不好.

就算有建立Index也會遇到上述問題

假如我們想要提升JOIN條件範圍效能並讓Index可以發揮最大最用可以怎麼做?

就讓我利用一個範例來跟大家分享.

案例

此範例有使用到三張表

  • Product表:擁有1-10編號產品
  • ReportPeriod表:存放產每期報表的資訊(時間,和是否產報表)
  • T99表:線上產品訂單資訊
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATE TABLE [dbo].[Product](
[ProductId] INT NOT NULL
)

CREATE TABLE [dbo].[T99](
[TransactionId] [int] IDENTITY(1,1) NOT NULL,
[Amount] DECIMAL(18,6),
[CreateDate] [datetime2](3) NULL
)
GO

CREATE TABLE [dbo].[ReportPeriod](
[PerioidID] [int] IDENTITY(1,1) NOT NULL,
[ProductId] INT NOT NULL,
[IsGenerate] [bit] NULL,
[StartDate] [datetime2](3) NULL,
[EndDate] [datetime2](3) NULL
) ON [PRIMARY]
GO

我們利用T99.CreateDate來跟ReportPeriod判斷是屬於哪期報表.

資料初始化

我們利用亂數產生Sample資料來模擬線上大資料狀況.

ReportPeriod期別由'2019-08-01''2020-07-31'

因為產生報表以5分鐘為區間,所以可以利用CTE遞迴來幫我們產生資料.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
INSERT INTO [dbo].[Product] VALUES (1);
INSERT INTO [dbo].[Product] VALUES (2);
INSERT INTO [dbo].[Product] VALUES (3);
INSERT INTO [dbo].[Product] VALUES (4);
INSERT INTO [dbo].[Product] VALUES (5);
INSERT INTO [dbo].[Product] VALUES (6);
INSERT INTO [dbo].[Product] VALUES (7);
INSERT INTO [dbo].[Product] VALUES (8);
INSERT INTO [dbo].[Product] VALUES (9);
INSERT INTO [dbo].[Product] VALUES (10);

declare @FromDate DATETIME2(3) = '2019-08-01'
declare @ToDate DATETIME2(3) = '2020-07-31'

;WITH CTE AS (
SELECT @FromDate fromDt,@ToDate endDt
UNION ALL
SELECT DATEADD(MINUTE,5,fromDt),endDt
FROM CTE
WHERE DATEADD(MINUTE,5,fromDt) < endDt
)
INSERT INTO [dbo].[ReportPeriod] ([ProductId],[IsGenerate],[StartDate],[EndDate])
SELECT ProductId,0,fromDt,DATEADD(MINUTE,5,fromDt)
FROM CTE CROSS JOIN dbo.Product
OPTION (MAXRECURSION 0);

INSERT INTO T99 ([CreateDate],Amount)
SELECT top 1000000 dateadd(SECOND,
rand(checksum(newid()))*(1+datediff(SECOND, @FromDate, @ToDate)),
@FromDate),
CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + 1
FROM sys.all_columns c1 CROSS JOIN sys.all_columns c2

建立Index

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE CLUSTERED INDEX [CIX_ReportPeriod_StartDate] ON [dbo].[ReportPeriod]
(
[StartDate] ASC,
[EndDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

ALTER TABLE [dbo].[ReportPeriod] ADD CONSTRAINT [PK_ReportPeriod] PRIMARY KEY NONCLUSTERED
(
[PerioidID] ASC,
[ProductId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE CLUSTERED INDEX [CIX_T99_CreateDate] ON [dbo].[T99]
(
[CreateDate] ASC
)

CREATE UNIQUE NONCLUSTERED INDEX [IX_T99_TransactionId] ON [dbo].[T99]
(
[TransactionId] ASC
)

查詢語法

在線上我們會使用UDT當作參數來取得某些期別資訊.

這裡為了方便模擬我使用Table Variable來取代.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
set nocount on
DECLARE @Transaction AS TABLE(
TransactionId INT,
ProductId INT
);

INSERT INTO @Transaction VALUES (1,1)
INSERT INTO @Transaction VALUES (101,2)
INSERT INTO @Transaction VALUES (1001,3)

SELECT p.*
FROM dbo.T99 t
JOIN [dbo].[ReportPeriod] p ON t.CreateDate BETWEEN p.StartDate AND p.EndDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId

使用上面語法我們只需查詢三個期別資料,但看執行計畫時能發現,ReportPeriod使用的Clustered預估資訊有1百多萬筆

Q:我明明有對於條件建立Index,但為什麼預估值卻會跑真那麼嚴重?

1
2
3
4
5
CREATE CLUSTERED INDEX [CIX_ReportPeriod_StartDate] ON [dbo].[ReportPeriod]
(
[StartDate] ASC,
[EndDate] ASC
)

原因出在範圍條件會因為查找範圍過大導致預估值不準確

甚麼意思? 讓我們看看下圖(代表ReportPeriod內含日期資料)

而我們在JOIN條件只有t.CreateDate BETWEEN p.StartDate AND p.EndDate這就會導致,我們需要查找ReportPeriod日期資料在挑出符合的資料

1
2
JOIN [dbo].[ReportPeriod] p ON t.CreateDate BETWEEN p.StartDate AND p.EndDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId

最後就會看到走針的估計值

如何優化?

效能差問題,選擇對Index和撰寫合理的查詢可以改善40%左右問題

我們思考一下如果可以把範圍條件改成精準=查找條件不就可以更精準預估資訊了?

1
t.CreateDate BETWEEN p.StartDate AND p.EndDate

那我們怎麼把上面條件使用=取代BETWEEN範圍查詢呢?

這時我們可以利用空間來換取時間

建立一個新的COLUMN運用算法來計算每個期數StartTime

例如:CreateDate = 2020/01/03 10:08:55會歸類在2020/01/03 10:05:00

1
2
3
4
5
6
7
8
ALTER TABLE dbo.T99 ADD PeriodDate AS DATEADD(MINUTE,DATEPART(MINUTE,CreateDate) %5 * -1,
DATETIMEFROMPARTS(
DATEPART(YEAR,CreateDate),
DATEPART(MONTH,CreateDate),
DATEPART(DAY,CreateDate),
DATEPART(HOUR,CreateDate),
DATEPART(MINUTE,CreateDate),0,0)
)

建立完新COLUMN後別忘記加入一個Index給此COLUMN.

1
2
3
CREATE INDEX IX_PeriodDate_T99 ON dbo.T99(
PeriodDate
)

最後我們修改一下查詢語法

1
2
3
4
SELECT p.*
FROM dbo.T99 t
JOIN [dbo].[ReportPeriod] p ON p.StartDate = t.PeriodDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId

預估值和讀取值已經可以大幅降低了!!

小結:

JOIN範圍條件差效能問題,可以思考一下是否有辦法利用算法或是公式來優化查詢效能,如此次範例一樣.

__此文作者__:Daniel Shih(石頭)
__此文地址__: https://isdaniel.github.io/Join-Index-Improve/
__版權聲明__:本博客所有文章除特別聲明外,均採用 CC BY-NC-SA 3.0 TW 許可協議。轉載請註明出處!

如果本文對您幫助很大,可街口支付斗內鼓勵石頭^^