記一次內存佔用異常排查 —— memory ballast 被分配了物理內存

Howard Cheung 收录于类别 Development 和系列 Golang 入門筆記

2023-03-12 2023-03-12 约 2231 字预计阅读 5 分钟

系列 - Golang 入門筆記

memory ballast 的概念這裏不再贅述，不瞭解的話可以閱讀提出這個概念的文章，裏面有詳細的描述。這幾年，ballast 被大量運用，在大家的認知裏，ballast 是降低 GC 頻率的一個簡單、實用的方法，我也一直沒有看到過關於它的負面報道 —— 直到這次之前。

在 golang-nuts 郵件列表中，也有一個關於這個問題的討論，但內容比較簡略。本文會對這個問題的來龍去脈做一個簡單易懂的概述。如果有錯誤，歡迎指正。

背景

最近遇到，總有一小部分實例，內存（RSS）佔用比其他實例大。而且和正常的實例相比，經過反覆排查也沒有看出它們的環境有明顯的差異。

後面監測中發現，這些實例的 ballast 都整個地被分配了物理內存，並且是啓動、創建 ballast 時就這樣。

原因

OS（熟悉者可跳過）

衆所周知，現代操作系統，尤其是類 Unix 系統中，虛擬內存機制被廣泛使用。用戶進程對內存的申請、訪問等都是在虛擬地址空間內進行的，當進程訪問內存時，纔會通過“缺頁異常”中斷，調入對應的內存分頁。

比如，當 Go runtime 申請了一塊大小爲 1GB 的連續內存時，會在虛擬地址空間中得到一段長度爲 1GB 的地址，但在它被訪問之前，OS 並不會調入對應的物理內存分頁，此時也不會佔用 1GB 的物理內存。這是 ballast 的理論基礎。

ballast 通常的實現是，申請一個大切片，並設置它 KeepAlive（防止 Go 幫倒忙把它優化掉），然後保持它存在但永不訪問它，這樣結果就不會佔用物理內存，同時會佔着堆內存，使得 GC 的觸發頻率降低。

而事實上卻出現了 ballast 佔用物理內存的情況，最容易想到的原因是 Go runtime 在創建 ballast 大切片時訪問了它。

Go runtime

在 Go 的內存分配機制中，大於 32KB 的內存屬於大內存，是通過 mheap 分配的。Go 語言原本對應章節中有提到一個“清零”操作。如果在分配 ballast 的內存時，發生了這個清零操作，結果似乎就是會發生 ballast 喫內存的情況。Go 語言原本里沒有介紹如何判斷是否需要清零。

關於清零，在開頭提到的郵件列表裏，Golang 團隊的開發者，也是下文將提到的 go1.19 GC 相關新特性的提出者，Michael Knyszek 進行了一段回覆（譯文）：

runtime 有一個簡單的啓發式方法來避免清零操作，但它遠非完美。因此，ballast 本質上總是會有一點風險。在某些平臺上尤其如此，例如 Windows，因爲無法避免將內存標記爲已提交（Windows 可以自由地對範圍內的內存使用按需分頁，因此整體系統內存壓力可能會增加，但您不能避免將其計爲特定進程的已提交）。

判斷的具體邏輯（Github 地址）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62


// allocNeedsZero checks if the region of address space [base, base+npage*pageSize),
// assumed to be allocated, needs to be zeroed, updating heap arena metadata for
// future allocations.
//
// This must be called each time pages are allocated from the heap, even if the page
// allocator can otherwise prove the memory it's allocating is already zero because
// they're fresh from the operating system. It updates heapArena metadata that is
// critical for future page allocations.
//
// There are no locking constraints on this method.
func (h *mheap) allocNeedsZero(base, npage uintptr) (needZero bool) {
	for npage > 0 {
		ai := arenaIndex(base)
		ha := h.arenas[ai.l1()][ai.l2()]

		zeroedBase := atomic.Loaduintptr(&ha.zeroedBase)
		arenaBase := base % heapArenaBytes
		if arenaBase < zeroedBase {
			// We extended into the non-zeroed part of the
			// arena, so this region needs to be zeroed before use.
			//
			// zeroedBase is monotonically increasing, so if we see this now then
			// we can be sure we need to zero this memory region.
			//
			// We still need to update zeroedBase for this arena, and
			// potentially more arenas.
			needZero = true
		}
		// We may observe arenaBase > zeroedBase if we're racing with one or more
		// allocations which are acquiring memory directly before us in the address
		// space. But, because we know no one else is acquiring *this* memory, it's
		// still safe to not zero.

		// Compute how far into the arena we extend into, capped
		// at heapArenaBytes.
		arenaLimit := arenaBase + npage*pageSize
		if arenaLimit > heapArenaBytes {
			arenaLimit = heapArenaBytes
		}
		// Increase ha.zeroedBase so it's >= arenaLimit.
		// We may be racing with other updates.
		for arenaLimit > zeroedBase {
			if atomic.Casuintptr(&ha.zeroedBase, zeroedBase, arenaLimit) {
				break
			}
			zeroedBase = atomic.Loaduintptr(&ha.zeroedBase)
			// Double check basic conditions of zeroedBase.
			if zeroedBase <= arenaLimit && zeroedBase > arenaBase {
				// The zeroedBase moved into the space we were trying to
				// claim. That's very bad, and indicates someone allocated
				// the same region we did.
				throw("potentially overlapping in-use allocations detected")
			}
		}

		// Move base forward and subtract from npage to move into
		// the next arena, or finish.
		base += arenaLimit - arenaBase
		npage -= (arenaLimit - arenaBase) / pageSize
	}
	return
}

注：原子操作 Casuintptr 的作用是，如果 p1 == p2，則 p1 = p3 並 return 1；否則無操作，return 0。即“Compare And Swap”

它會去遍歷此次分配內存將涉及到的各個 arena（Go 內存分配中的一類大對象，詳見 Go 語言原本），分別檢查它們的 zeroedBase（值越大說明無需清零的內存越少），判斷是否需要清零，並會增大 zeroedBase 的值。即，它的值可以理解爲已被分配過、需要清零的值。需要注意的是，只要有一個 arena 符合 arenaBase < zeroedBase，都是整體地返回 true。

可以看出，arena 裏已經被分配過又回收的內存，再次分配給 ballast 時，這次分配就會被判斷爲需要清零，進而出現開頭描述的問題。因爲 ballast 通常都是在啓動早期創建的，在它之前分配的內存很少，所以這是個概率較小的事件，但確實存在。

建議

對於仍在繼續使用 ballast 的讀者，爲了預防此問題，建議考慮以下方案替代它。

memory target

這是 1.19 的新功能，可以設置一個固定數值的，GC 觸發的目標堆大小。有兩種方法：

環境變量 GOMEMLIMIT。設置爲數字，單位 byte；也可以用數字加單位如 1MiB，1GiB。
debug.SetMemoryTarget(limit int64)，單位也是 byte

這個功能是爲了替代 ballast 設計的，當它被設置後，runtime 會通過多種方法，包括調整 GC 觸發頻率、返還內存給操作系統的頻率等，儘量使內存不超過它。它測量內存是否達到限制的指標是 go runtime 管理的所有內存，相當於 memStats 中 Sys - HeapReleased 的值。它的效果理論上類似且優於 ballast。

使用它限制內存時，可以關閉按比例的 GC（GOGC=off），或將其比例調大。

不過，它和 ballast 一樣，不是硬限制，不要把它的值設置爲環境允許的內存佔用極限。

gc tuner

對於舊版本的 golang，還有一個方案是由 uber 提出的的。思路是動態地調整 GC 觸發的比例。有兩個開源實現：cch123/gogctuner、bytedance/gopkg/util/gctuner。

仍然使用 ballast

如果想繼續使用 ballast ，我想以下兩點可能有助於降低該問題發生的概率：

儘量早創建 ballast
在創建 ballast 前關閉 GC