Dual Numbers#

The rateslib.dual module creates dual number datatypes that provide this library with the ability to perform automatic differentiation (AD). The implementation style here is known as forward mode, as opposed to reverse mode (otherwise called automatic adjoint differentiation or back propagation).

Summary#

Classes#

`rateslib.dual.Dual`(real, vars, dual)	Dual number data type to perform first derivative automatic differentiation.
`rateslib.dual.Dual2`(real, vars, dual, dual2)	Dual number data type to perform second derivative automatic differentiation.
`rateslib.dual.Variable`(real[, vars, dual])	A user defined, exogenous variable that automatically converts to a `Dual` or `Dual2` type dependent upon the overall AD calculation order.

Methods#

`rateslib.dual.gradient`(dual[, vars, order, ...])	Return derivatives of a dual number.
`rateslib.dual.dual_exp`(x)	Calculate the exponential value of a regular int or float or a dual number.
`rateslib.dual.dual_log`(x[, base])	Calculate the logarithm of a regular int or float or a dual number.
`rateslib.dual.dual_norm_pdf`(x)	Return the standard normal probability density function.
`rateslib.dual.dual_norm_cdf`(x)	Return the cumulative standard normal distribution for given value.
`rateslib.dual.dual_inv_norm_cdf`(x)	Return the inverse cumulative standard normal distribution for given value.
`rateslib.dual.dual_solve`(A, b[, allow_lsq, ...])	Solve a linear system of equations involving dual number data types.

Example#

First Derivatives#

\[\begin{split}f(x, y, z) = x^6 + e^{\frac{x}{y}} + \ln {z} \\ f(2, 1, 2) = 72.0822...\end{split}\]

In [1]: from rateslib.dual import *

In [2]: def func(x, y, z):
   ...:     return x**6 + dual_exp(x/y) + dual_log(z)
   ...: 

In [3]: func(2, 1, 2)
Out[3]: 72.0822032794906

For extracting only first derivatives it is more efficient to use Dual:

\[\begin{split}\frac{\partial f}{\partial x} &= \left . 6 x^5 + \frac{1}{y} e^{\frac{x}{y}} \right |_{(2,1,2)} = 199.3890... \\ \frac{\partial f}{\partial y} &= \left . -\frac{x}{y^2} e^{\frac{x}{y}} \right |_{(2,1,2)} = -14.7781... \\ \frac{\partial f}{\partial z} &= \left . \frac{1}{z} \right |_{(2,1,2)} = 0.50 \\\end{split}\]

In [4]: x, y, z = Dual(2, ["x"], []), Dual(1, ["y"], []), Dual(2, ["z"], [])

In [5]: func(x, y, z)
Out[5]: <Dual: 72.082203, (x, y, z), [199.4, -14.8, 0.5]>

In [6]: gradient(func(x, y, z), ["x", "y", "z"])
Out[6]: array([199.3890561, -14.7781122,   0.5      ])

Second Derivatives#

For extracting second derivatives we must use Dual2:

In [7]: x, y, z = Dual2(2, ["x"], [], []), Dual2(1, ["y"], [], []), Dual2(2, ["z"], [], [])

In [8]: func(x, y, z)
Out[8]: <Dual2: 72.082203, (x, y, z), [199.4, -14.8, 0.5], [[...]]>

In [9]: gradient(func(x, y, z), ["x", "y", "z"])
Out[9]: array([199.3890561, -14.7781122,   0.5      ])

In [10]: gradient(func(x, y, z), ["x", "y"], order=2)
Out[10]: 
array([[487.3890561 , -22.1671683 ],
       [-22.1671683 ,  59.11244879]])

The keep_manifold argument is also exclusively available for Dual2. When extracting a first order gradient from a Dual2 this is will use information about second order and transfer it to first order thus representing a linear manifold of the gradient. This is useful for allowing composited automatic calculation of second order gradients. For example consider the following functions, \(g(x)=x^2\) and \(h(y)=y^2\), evaluated at the points \(x=2\) and \(y=4\). This creates the quadratic manifolds centered at those points expressed in the following Dual2 numbers:

In [11]: g = Dual2(4, ["x"], [4], [1])  # g(x=2)

In [12]: h = Dual2(16, ["y"], [8], [1])  # h(y=4)

If we wish to multiply these two functions and evaluate the second order derivatives at (2, 4) we can simply do,

In [13]: gradient(g*h, order=2)
Out[13]: 
array([[32., 32.],
       [32.,  8.]])

And observe that, say, \(\frac{\partial (gh)}{\partial x \partial y} = 4xy|_{(2, 4)} = 32\), as shown in the above array.

But, we can also use the product rule of differentiation to assert that,

\[\begin{split}d_{x\zeta}^2(gh) = d_x \left ( d_\zeta(g)h + gd_\zeta(h) \right ) \\\\ d_{y\zeta}^2(gh) = d_y \left ( d_\zeta(g)h + gd_\zeta(h) \right ) \\\\\end{split}\]

which we express in our dual language as,

In [14]: gradient(g, ["x", "y"], keep_manifold=True) * h + g * gradient(h, ["x", "y"], keep_manifold=True)
Out[14]: 
array([<Dual2: 64.000000, (x, y), [36.0, 36.0], [[...]]>,
       <Dual2: 32.000000, (x, y), [48.0, 24.0], [[...]]>], dtype=object)

If the manifold is not maintained the product rule fails because information that is required to ultimately determine that desired second derivative is discarded.

In [15]: gradient(g, ["x", "y"]) * h + g * gradient(h, ["x", "y"])
Out[15]: 
array([<Dual2: 64.000000, (y, x), [32.0, 0.0], [[...]]>,
       <Dual2: 32.000000, (y, x), [0.0, 32.0], [[...]]>], dtype=object)

More specifically,

In [16]: gradient(g, ["x", "y"], keep_manifold=True)
Out[16]: 
array([<Dual2: 4.000000, (x, y), [2.0, 0.0], [[...]]>,
       <Dual2: 0.000000, (x, y), [1.0, 1.0], [[...]]>], dtype=object)

while,

In [17]: gradient(g, ["x", "y"])
Out[17]: array([4., 0.])

Implementation#

Forward mode AD is implemented using operating overloading and custom compatible functions. The operations implemented are;

addition (+),

subtraction and negation (-),

multiplication (*),

division and inversion (/) (**-1),

n’th power where n is an integer or a float (**n),

exponential and logarithms (which require the specific methods below),

equality of dual numbers with integers and floats and with each other.

Warning

Dual and Dual2 are not designed to operate with each other. The purpose for this is to avoid miscalculation of second derivatives. Dual should always be replaced by Dual2 in this instance. TypeErrors will be raised otherwise.

Compatability with NumPy#

To enable this library to perform its calculations in a vectorised way we need to leverage NumPy’s array calculations. NumPy arrays containing dual numbers are forced to have an object dtype configuration. This is imposed by NumPy and means that certain functions may not be compatible, for example np.einsum (although, support for object dtypes was added to np.einsum as of version 1.25.0). However, many functions are compatible.

Broadcasting#

Operations of Dual and Dual2 with int and float dtypes permit the NumPy versions; np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64, and np.float128. Broadcasting of arrays has been implemented so that the following operations work as expected.

In [18]: np_arr = np.array([1, 2])

In [19]: Dual(3, ["x"], []) * np_arr
Out[19]: 
array([<Dual: 3.000000, (x), [1.0]>, <Dual: 6.000000, (x), [2.0]>],
      dtype=object)

In [20]: np_arr / Dual(4, ["y"], [])
Out[20]: 
array([<Dual: 0.250000, (y), [-0.1]>, <Dual: 0.500000, (y), [-0.1]>],
      dtype=object)

In [21]: Dual(4, ["x"], []) ** np_arr
Out[21]: 
array([<Dual: 4.000000, (x), [1.0]>, <Dual: 16.000000, (x), [8.0]>],
      dtype=object)

Elementwise Operations#

Simple operations on tensors also work as expected.

In [22]: x = np.array([Dual(1, ["x"], []), Dual(2, ["y"], [])])

In [23]: y = np.array([Dual(3, ["x"], []), Dual(4, ["y"], [])])

In [24]: x + y
Out[24]: 
array([<Dual: 4.000000, (x), [2.0]>, <Dual: 6.000000, (y), [2.0]>],
      dtype=object)

In [25]: x * y
Out[25]: 
array([<Dual: 3.000000, (x), [4.0]>, <Dual: 8.000000, (y), [6.0]>],
      dtype=object)

In [26]: x / y
Out[26]: 
array([<Dual: 0.333333, (x), [0.2]>, <Dual: 0.500000, (y), [0.1]>],
      dtype=object)

Linear Algebra#

Common linear algebraic operations are also available, such as:

np.matmul

np.inner

np.dot

np.tensordot

In [27]: np.dot(x, y)
Out[27]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [28]: np.inner(x, y)
Out[28]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [29]: np.matmul(x[:, np.newaxis], y[np.newaxis, :])
Out[29]: 
array([[<Dual: 3.000000, (x), [4.0]>,
        <Dual: 4.000000, (x, y), [4.0, 1.0]>],
       [<Dual: 6.000000, (y, x), [3.0, 2.0]>,
        <Dual: 8.000000, (y), [6.0]>]], dtype=object)

In [30]: np.tensordot(x[np.newaxis, :, np.newaxis], y[np.newaxis, :], (1, 1))
Out[30]: array([[[<Dual: 11.000000, (x, y), [4.0, 6.0]>]]], dtype=object)

Solving the linear system, \(Ax=b\), is not not directly possible from NumPy, thus a custom solver, dual_solve(), has been implemented using the Doolittle algorithm with partial pivoting.

In [31]: A = np.array([
   ....:     [1, 0],
   ....:     [Dual(2, ["z"], []), 1]
   ....: ], dtype="object")
   ....: 

In [32]: b = np.array([Dual(2, ["y"], []), Dual(5, ["x", "y"], [])])[:, np.newaxis]

In [33]: x = dual_solve(A, b)

In [34]: x
Out[34]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 1.000000, (z, x, y), [-2.0, 1.0, -1.0]>]], dtype=object)

In [35]: np.matmul(A, x)
Out[35]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 5.000000, (z, x, y), [0.0, 1.0, 1.0]>]], dtype=object)

Exogenous Variables#

The Variable class allows users to inject sensitivity into calculations without knowing which AD order is required for calculations - calculations will not raise TypeErrors. Upon first instance of a binary operation with an object of either order it will return an object of associated type.

In [36]: x = Variable(2.5, ["x"])

In [37]: x * Dual(1.5, ["y"], [2.2])
Out[37]: <Dual: 3.750000, (x, y), [1.5, 5.5]>

In [38]: x * Dual2(1.5, ["y"], [2.2], [])
Out[38]: <Dual2: 3.750000, (x, y), [1.5, 5.5], [[...]]>

In order for other internal processes to function dynamically, rateslib maintains a global AD order. When a Variable performs a self operation from which it cannot infer the AD order, it will refer to this global state.

In [39]: defaults._global_ad_order = 1

In [40]: dual_exp(x)
Out[40]: <Dual: 12.182494, (x), [12.2]>

In [41]: defaults._global_ad_order = 2

In [42]: dual_exp(x)
Out[42]: <Dual2: 12.182494, (x), [12.2], [[...]]>

In [43]: defaults._global_ad_order = 1  # Reset