定点数开根号算法（2）

前言

在上一篇文章中，我们介绍了定点数开根号的几种算法原理，包括牛顿-拉弗森法和逐比特确认法。本文继续写牛顿-拉弗森法的实现细节。

牛顿-拉弗森法的实现细节

初始值的选择

根据上一篇文章中我们推导的根的比特数公式：

$$ \begin{cases} B_r = \frac{B_N}{2} &\text{if } B_N=2k, k\in \mathbb{Z} \\ B_r = \frac{B_N+1}{2} &\text{if } B_N=2k+1, k \in \mathbb{Z} \end{cases} $$

定义根为 $B$ ，我们可以得出根的取值范围大约是

$$ 2^{B_r-1}\le{B}\lt{2^{B_r}} $$

我们可以选择牛顿法的初始值为 $x_0=2^{B_r}$ 也就是根取值范围的上边界，来保证足够接近真实根，让牛顿法可以较快收敛。

这种方法的优点是计算简单，只需要进行位宽计算和移位操作，能够保证初始值在合理范围内，确保算法收敛。

定点数推广

我们将定点数 $V$ 的整数部分设为 $M$，小数部分设为 $F$，小数有 $n$ 比特，定点数根为 $R$ ， $V=M+\frac{F}{2^n}$ 。

$V$ 的整数表示形式为 $V_{int}=V \cdot 2^n=M \cdot 2^n + F$ 。

$R$ 的整数表示形式为 $R_{int}=R \cdot 2^n$

我们有下面的方程：

$$ \begin{aligned} R &= \sqrt{V} \\ &= \sqrt{M+\frac{F}{2^n}} \\ &= \sqrt{\frac{M \cdot 2^n + F}{2^n}} \\ &= \frac{\sqrt{M \cdot 2^n + F}}{\sqrt{2^n}} \\ &= \frac{\sqrt{M \cdot 2^n + F}}{2^{\frac{n}{2}}} \\ &= \frac{\sqrt{V_{int}}}{2^{\frac{n}{2}}} \end{aligned} $$

我们将两边同时乘 $2^n$，把根转为整数表示（即定点数）

$$ \begin{aligned} R \cdot 2^n &= \frac{\sqrt{V_{int}}}{2^{\frac{n}{2}}} \cdot 2^n \\ R_{int} &= \sqrt{V_{int}} \cdot \frac{1}{2^{\frac{n}{2}}} \cdot 2^n \\ R_{int} &= \sqrt{V_{int}} \cdot \frac{2^n}{2^{\frac{n}{2}}} \\ R_{int} &= \sqrt{V_{int}} \cdot 2^{n-\frac{n}{2}} \\ R_{int} &= \sqrt{V_{int}} \cdot 2^{\frac{n}{2}} \\ \end{aligned} $$

提高精度

从最终的方程：$R_{int} = \sqrt{V_{int}} \cdot 2^{\frac{n}{2}}$ 不难看出，迭代收敛的 $\sqrt{V_{int}}$ 最终会进行一个左移操作，当 $\sqrt{V_{int}}$ 迭代只有 32 位精度，$n=16$ 时，我们的结果最多只能达到小数点后 $\frac{n}{2}$ 比特的精度。

最直接的思路就是我们提高中间迭代的精度，也就是从 Q15.16 转为更大的，比如 Q31.32 表示。

转换方法非常直接，目标精度 $p\gt{n}$，只要把 $n$ 位精度的表示形式左移 $p-n$ 比特即可缩放到 $p$ 位精度的表示形式。

当迭代使用的 $V_{int}$ 是Q31.32 时，得到的 $R_{int}$ 也是 Q31.32 表示。

我们把 32 位精度的 $R_{int}$ 记作 $R_{q32}$ ，32 位精度的 $V_{int}$ 记作 $V_{q32}$，类似的，我们把 Q15.16 表示的根记作 $R_{q16}$。

$$ \begin{aligned} R_{q32} &= \sqrt{V_{q32}} \cdot 2^{\frac{32}{2}} \\ R_{q32} &= \sqrt{V_{q32}} \cdot 2^{16} \\ \end{aligned} $$

求出 $R_{q32}$ 后，我们需要将 $R_{q32}$ 从 Q31.32 转回 Q15.16 ，方法是右移 $p-n=32-16=16$ 位。

$$ \begin{aligned} R_{q16} &= \frac{R_{q32}}{2^{16}} \\ R_{q16} &= \frac{\sqrt{V_{q32}} \cdot 2^{16}}{2^{16}} \\ R_{q16} &= \sqrt{V_{q32}} \\ \end{aligned} $$

我们还可以更简单地去证明：

$$ \begin{aligned} \sqrt{V_{q32}} &= \sqrt{V \cdot 2^{32}} \\ &= \sqrt{V} \cdot 2^{16} \end{aligned} $$

$\sqrt{V_{q32}}$ 的结果就已经是 $\sqrt{V}$ 的 Q15.16 精度表示形式。

迭代终止条件

我们确定迭代中，如果本轮计算的 $x_{n+1}$ 等于 $x_n$ 的话，下一轮也只会得到相同的 $x_n$，所以此时可以认为达到了最大精度。

为了避免出现接近最大精度时抖动无法收敛的情况，我们再增加一个防御性的最大迭代轮次检查。

最终实现

// Sqrt performs fixed-point arithmetic square root of a Fix32 number in Newton & Raphson method.
func (f Fix32) Sqrt() (Fix32, error) {
	if f < 0 {
		return 0, ErrNaN
	}

	// square root of zero is always zero
	if f == 0 {
		return 0, nil
	}

	var (
		// convert q15.16 to q31.32
		v = int64(f) << 16
		// f always be positive, bitwidth correctly reflect the minimum bits needed to represent the number f
		bitwidth = bits.Len64(uint64(v))
	)

	// choose initial guess.
	// We can proof that the square root bitwidth is ceil(bitwidth/2).
	// It means square root is between 2^(ceil(bitwidth/2)-1) and 2^ceil(bitwidth/2).
	// So we can always choose upper bound as initial guess which is 2^(bitwidth/2).
	var x int64
	if bitwidth&1 == 1 {
		x = 1 << ((bitwidth + 1) / 2)
	} else {
		x = 1 << (bitwidth / 2)
	}

	for range NewtonRaphsonDivMaximumRounds {
		// x = (x+(a/x))/2
		nx := (x + (v / x)) / 2

		// we can promise fixpoint arithmetic with same input always produce same output
		// so if x and nx are equal, we can expect next round still produce same nx as this round
		if nx == x {
			break
		}

		x = nx
	}

	return Fix32(x), nil
}