SQL Server Merge condition on declare value problem

前言

假如要判斷資料是否存在於資料表中,存在就更新,不存在就新增.

這時我們可以使用Merge來幫助我們完成.

當兩個資料表有複雜的比對的特性時,MERGE陳述式的條件式行為表現最佳。

有了Merge我們就不用使用IF EXISTS.

一切都是這麼完美…

直到到有一天Merge在Prod撞到一個問題..

問題描述

使用語法user defined table type & Table如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

CREATE TABLE [dbo].[PriceLimitation](
[CategoryID] [int] NOT NULL,
[ProdcutGroupID] [smallint] NOT NULL,
[UserID] [int] NOT NULL,
[StakeAmount] [numeric](18, 4) NOT NULL,
[ProductID] [smallint] NOT NULL,
CONSTRAINT [PK_PriceLimitation] PRIMARY KEY CLUSTERED
(
[UserID] ASC,
[CategoryID] ASC,
[ProductID] ASC,
[ProdcutGroupID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE(
[CategoryID] [int] NOT NULL,
[ProdcutGroupID] [smallint] NOT NULL,
[UserID] [int] NOT NULL,
[StakeAmount] [numeric](18, 4) NOT NULL,
[ProductID] [smallint] NOT NULL,
PRIMARY KEY CLUSTERED
(
[UserID] ASC,
[CategoryID] ASC,
[ProductID] ASC,
[ProdcutGroupID] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)

呼叫執行SP[dbo].[CalculateLimitation]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE OR ALTER PROC [dbo].[CalculateStake]
@CategoryID int,
@ProductID smallint ,
@ProdcutGroupID smallint,
@PriceLimit [uftt_PriceLimit] readonly
AS
BEGIN
SET NOCOUNT ON;

MERGE INTO [dbo].[PriceLimitation] t1
USING @PriceLimit t2
ON t1.UserID = t2.UserID
AND t1.ProdcutGroupID= @ProdcutGroupID
AND t1.CategoryID=@CategoryID
AND t1.ProductID = @ProductID
WHEN MATCHED THEN
UPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmount
WHEN NOT MATCHED THEN
INSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);
END

主要傳入參數判斷更新或新增[dbo].[PriceLimitation]

我們在ELK發現在執行SP時很少概率會發生PRIMARY KEY重複問題.

Violation of PRIMARY KEY constraint ‘PK_PriceLimitation’. Cannot insert duplicate key in object ‘dbo.PriceLimitation’.

但這張表PK使用欄位都有正確在Merge條件上,所以當下我嘗試在DEV重現此問題,但一直無法成功

後來發現此問題在高併發時才會發生,所以我們使用Query Stress來幫助我們模擬高併發請求時的狀態.

Query Stress重現問題

撰寫了模擬SQL並利用Query Stress重現問題.

模擬SQL腳本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DECLARE @CategoryID int,
@ProductID smallint ,
@ProdcutGroupID smallint,
@PriceLimit [uftt_PriceLimit]

declare @from int
SELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)
SELECT @CategoryID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1
SELECT @ProductID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1
SELECT @ProdcutGroupID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1

;with cte as(
select @from as num
union all
select num + 1 as num from cte
where num < @from+500
)
insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])
select num,100,@CategoryID,@ProductID,@ProdcutGroupID
from cte
option(MAXRECURSION 0);

exec [dbo].[CalculateStake] @CategoryID,@ProductID,@ProdcutGroupID,@PriceLimit

我使用模擬參數是

  • Iterator:30
  • Thread:100

來模擬高併發時資料庫請求狀況,就能發現這時已經會出現Prod的Exception.

找到問題尋求解法

對於目前Prod問題已經邁出一大步了,因為現在問題可以重現,在網路上找了許多文章還是沒找到解法….

後面在MSDN時看到關鍵一段話,關於merge-transact-sql.

請務必只從目標資料表指定用於比對用途的資料行。 也就是說,從目標資料表中指定要與來源資料表的對應資料行進行比較的資料行。 請勿嘗試在 ON 子句中篩選出目標資料表的資料列 (例如指定 AND NOT target_table.column_x = value) 來改善查詢效能。 這樣做可能會傳回非預期且不正確的結果。

後面有看到有篇文章在介紹use-where-clause-with-merge

我就嘗試把sp寫法改成只利用兩個Table可以JOIN欄位當作條件,發現Duplicate PK問題就可以解決了….但發現另一個更麻煩問題.

SP改寫後

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE OR ALTER PROC [dbo].[CalculateStake]
@CategoryID int,
@ProductID smallint ,
@ProdcutGroupID smallint,
@PriceLimit [uftt_PriceLimit] readonly
AS
BEGIN
SET NOCOUNT ON;

MERGE INTO [dbo].[PriceLimitation] t1
USING @PriceLimit t2
ON t1.UserID = t2.UserID
WHEN MATCHED
AND t1.ProdcutGroupID= @ProdcutGroupID
AND t1.CategoryID=@CategoryID
AND t1.ProductID = @ProductID
THEN
UPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmount
WHEN NOT MATCHED THEN
INSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);
END

新寫法的執行計畫在對於大資料表時會很沒效率….

改寫後遇到的問題(不好的執行計畫)

一般SP在執行過後都會把使用的執行計畫快取起來,所以我們可以透過DMV來查看執行執行計畫.

1
2
3
4
5
SELECT Cacheobjtype, Objtype, TEXT, query_plan
FROM sys.dm_exec_cached_plans t1
CROSS APPLY sys.dm_exec_sql_text(plan_handle) t2
CROSS APPLY sys.dm_exec_query_plan(plan_handle) t3
where t2.objectid = object_id('dbo.CalculateStake', 'p')

新和舊SP寫法執行計畫如下圖.

舊寫法

新寫法

造成上面差異原因,因為新寫法透過統計資訊使用效能較差的執行計畫(能看到上面使用Merge Join明明傳入結果集資料並不多)且在WHEN MATCHED進行第二次判斷…

所以效能就變很差,現在已經找到此問題點了,我就在思考那有沒有辦法兼具效能又可解決此問題呢?

最終版SP寫法

最後我就思考何不如把傳入參數全部加入user defined table type

  1. 這樣就可以利用傳入參數當作ON條件也可以得到精準執行計畫.
  2. user defined table type所有欄位可以跟Table的Clustered Index Match.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
drop proc[dbo].[CalculateStake]
drop type [dbo].[uftt_PriceLimit]

CREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE(
[CategoryID] [int] NOT NULL,
[ProdcutGroupID] [smallint] NOT NULL,
[UserID] [int] NOT NULL,
[StakeAmount] [numeric](18, 4) NOT NULL,
[ProductID] [smallint] NOT NULL,
PRIMARY KEY CLUSTERED
(
[UserID] ASC,
[CategoryID] ASC,
[ProductID] ASC,
[ProdcutGroupID] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)

GO

CREATE OR ALTER PROC [dbo].[CalculateStake]
@PriceLimit [uftt_PriceLimit] readonly
AS
BEGIN
SET NOCOUNT ON;

MERGE INTO [dbo].[PriceLimitation] t1
USING @PriceLimit t2
ON t1.UserID = t2.UserID
AND t1.ProdcutGroupID= t2.ProdcutGroupID
AND t1.CategoryID=t2.CategoryID
AND t1.ProductID =t2.ProductID
WHEN MATCHED THEN
UPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmount
WHEN NOT MATCHED THEN
INSERT VALUES(t2.CategoryID, t2.ProdcutGroupID, t2.UserID ,t2.StakeAmount, t2.ProductID);
END

測試腳本改成把參數透過uftt_PriceLimit傳入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DECLARE @CategoryID int,
@ProductID smallint ,
@ProdcutGroupID smallint,
@PriceLimit [uftt_PriceLimit]

declare @from int
SELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)
SELECT @CategoryID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1
SELECT @ProductID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1
SELECT @ProdcutGroupID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1

;with cte as(
select @from as num
union all
select num + 1 as num from cte
where num < @from+500
)
insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])
select num,100,@CategoryID,@ProductID,@ProdcutGroupID
from cte
option(MAXRECURSION 0);

exec [dbo].[CalculateStake] @PriceLimit

請在跑修改後的SP前記得把Table先Truncate掉,這樣可以更精準模擬

使用QueryStress模擬參數

  • Iterator:30
  • Thread:100

執行結果如下

利用Query Stress工具壓測發現問題解決且效能不會變差:)

小結:

沒想到MergeOn條件有些隱藏限制(對於On寫value condition官方只有說會有想不到的問題發生,並沒解釋原因為何…),但經過這次經驗我日後在使用Merge時不會直接在On使用value condition會在中間多墊一層Table這樣就可以使用ONJOIN.

另外QueryStress真是一個對於DB壓測找問題的好工具,推薦大家去了解使用

此文作者:Daniel Shih(石頭)
此文地址https://isdaniel.github.io/sqlserver-merge-condition-problem/
版權聲明:本博客所有文章除特別聲明外,均採用 CC BY-NC-SA 3.0 TW 許可協議。轉載請註明出處!


如果本文對您幫助很大,可街口支付斗內鼓勵石頭^^