Dual Numbers#

The rateslib.dual module creates dual number datatypes that provide this library with the ability to perform automatic differentiation (AD). The implementation style here is known as forward mode, as opposed to reverse mode (otherwise called automatic adjoint differentiation or back propagation).

The mathematics and theory used to implement rateslib’s AD is documented thoroughly in the companion book Coding Interest Rates: FX, Swaps and Bonds.

Summary#

Classes#

`rateslib.dual.Dual`(real, vars, dual)	Dual number data type to perform first derivative automatic differentiation.
`rateslib.dual.Dual2`(real, vars, dual, dual2)	Dual number data type to perform second derivative automatic differentiation.
`rateslib.dual.Variable`(real[, vars, dual])	A user defined, exogenous variable that automatically converts to a `Dual` or `Dual2` type dependent upon the overall AD calculation order.

Methods#

`rateslib.dual.gradient`(dual[, vars, order, ...])	Return derivatives of a dual number.
`rateslib.dual.dual_exp`(x)	Calculate the exponential value of a regular int or float or a dual number.
`rateslib.dual.dual_log`(x[, base])	Calculate the logarithm of a regular int or float or a dual number.
`rateslib.dual.dual_norm_pdf`(x)	Return the standard normal probability density function.
`rateslib.dual.dual_norm_cdf`(x)	Return the cumulative standard normal distribution for given value.
`rateslib.dual.dual_inv_norm_cdf`(x)	Return the inverse cumulative standard normal distribution for given value.
`rateslib.dual.dual_solve`(A, b[, allow_lsq, ...])	Solve a linear system of equations involving dual number data types.
`rateslib.dual.newton_1dim`(f, g0[, max_iter, ...])	Use the Newton-Raphson algorithm to determine the root of a function searching one variable.
`rateslib.dual.newton_ndim`(f, g0[, max_iter, ...])	Use the Newton-Raphson algorithm to determine a function root searching many variables.
`rateslib.dual.ift_1dim`(s, s_tgt, h[, ...])	Use the inverse function theorem to determine a function value of one variable.
`rateslib.dual.quadratic_eqn`(a, b, c, x0[, ...])	Solve the quadratic equation, \(ax^2 + bx +c = 0\), with error reporting.

Example#

Below, a standard Python function is created and is called with standard floats.

First Derivatives#

\[\begin{split}f(x, y, z) = x^6 + e^{\frac{x}{y}} + \ln {z} \\ f(2, 1, 2) = 72.0822...\end{split}\]

In [1]: from rateslib.dual import *

In [2]: def func(x, y, z):
   ...:     return x**6 + dual_exp(x/y) + dual_log(z)
   ...: 

In [3]: func(2, 1, 2)
Out[3]: 72.0822032794906

For extracting first derivatives, or gradients, we can depend on the rateslib Dual datatypes and its operator overloading:

\[\begin{split}\frac{\partial f}{\partial x} &= \left . 6 x^5 + \frac{1}{y} e^{\frac{x}{y}} \right |_{(2,1,2)} = 199.3890... \\ \frac{\partial f}{\partial y} &= \left . -\frac{x}{y^2} e^{\frac{x}{y}} \right |_{(2,1,2)} = -14.7781... \\ \frac{\partial f}{\partial z} &= \left . \frac{1}{z} \right |_{(2,1,2)} = 0.50 \\\end{split}\]

In [4]: x = Dual(2, ["x"], [])

In [5]: y = Dual(1, ["y"], [])

In [6]: z = Dual(2, ["z"], [])

In [7]: value = func(x, y, z)

In [8]: value.real
Out[8]: 72.0822032794906

In [9]: gradient(value, ["x", "y", "z"])
Out[9]: array([199.3890561, -14.7781122,   0.5      ])

Second Derivatives#

For extracting second derivatives we must use the Dual2 datatype:

In [10]: x = Dual2(2, ["x"], [], [])

In [11]: y = Dual2(1, ["y"], [], [])

In [12]: z = Dual2(2, ["z"], [], [])

In [13]: value = func(x, y, z)

In [14]: value.real
Out[14]: 72.0822032794906

In [15]: value.dual
Out[15]: array([199.3890561, -14.7781122,   0.5      ])

In [16]: gradient(value, ["x", "y"], order=2)
Out[16]: 
array([[487.3890561 , -22.1671683 ],
       [-22.1671683 ,  59.11244879]])

Implementation#

In [17]: x = Dual(2.0, ["x"], [])

In [18]: y = Dual(3.0, ["y"], [])

Forward mode AD is implemented using operating overloading and custom compatible functions. The operations implemented are;

addition (+)

In [19]: x + y + 1.5
Out[19]: <Dual: 6.500000, (x, y), [1.0, 1.0]>

subtraction and negation (-),

In [20]: x - y - 1.5
Out[20]: <Dual: -2.500000, (x, y), [1.0, -1.0]>

multiplication (*),

In [21]: x * y * 1.5
Out[21]: <Dual: 9.000000, (x, y), [4.5, 3.0]>

division and inversion (/) (**-1),

In [22]: x / y / 1.5
Out[22]: <Dual: 0.444444, (x, y), [0.2, -0.1]>

dual and float type powers (**),

In [23]: x ** y ** 1.5
Out[23]: <Dual: 36.660446, (x, y), [95.2, 66.0]>

exponential and logarithms (which require the specific methods below),

In [24]: dual_exp(x)
Out[24]: <Dual: 7.389056, (x), [7.4]>

In [25]: dual_log(y)
Out[25]: <Dual: 1.098612, (y), [0.3]>

equality of dual numbers with integers and floats and with each other.
```
In [26]: x == y
Out[26]: False
```

Warning

Dual and Dual2 are not designed to operate with each other. The purpose for this is to avoid miscalculation of second derivatives. Dual should always be replaced by Dual2 in this instance. TypeErrors will be raised otherwise.

Compatability with NumPy#

To enable this library to perform its calculations in a vectorised way we need to leverage NumPy’s array calculations. NumPy arrays containing dual numbers are forced to have an object dtype configuration. This is imposed by NumPy and means that certain functions may not be compatible, for example np.einsum (although, support for object dtypes was added to np.einsum as of version 1.25.0). However, many functions are compatible.

Broadcasting#

Operations of Dual and Dual2 with int and float dtypes permit the NumPy versions; np.int8, np.int16, np.int32, np.int64, np.float16, np.float32, np.float64, and np.float128. Broadcasting of arrays has been implemented so that the following operations work as expected.

In [27]: np_arr = np.array([1, 2])

In [28]: Dual(3, ["x"], []) * np_arr
Out[28]: 
array([<Dual: 3.000000, (x), [1.0]>, <Dual: 6.000000, (x), [2.0]>],
      dtype=object)

In [29]: np_arr / Dual(4, ["y"], [])
Out[29]: 
array([<Dual: 0.250000, (y), [-0.1]>, <Dual: 0.500000, (y), [-0.1]>],
      dtype=object)

In [30]: Dual(4, ["x"], []) ** np_arr
Out[30]: 
array([<Dual: 4.000000, (x), [1.0]>, <Dual: 16.000000, (x), [8.0]>],
      dtype=object)

Elementwise Operations#

Simple operations on tensors also work as expected.

In [31]: x = np.array([Dual(1, ["x"], []), Dual(2, ["y"], [])])

In [32]: y = np.array([Dual(3, ["x"], []), Dual(4, ["y"], [])])

In [33]: x + y
Out[33]: 
array([<Dual: 4.000000, (x), [2.0]>, <Dual: 6.000000, (y), [2.0]>],
      dtype=object)

In [34]: x * y
Out[34]: 
array([<Dual: 3.000000, (x), [4.0]>, <Dual: 8.000000, (y), [6.0]>],
      dtype=object)

In [35]: x / y
Out[35]: 
array([<Dual: 0.333333, (x), [0.2]>, <Dual: 0.500000, (y), [0.1]>],
      dtype=object)

Linear Algebra#

Common linear algebraic operations are also available, such as:

np.matmul

np.inner

np.dot

np.tensordot

In [36]: np.dot(x, y)
Out[36]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [37]: np.inner(x, y)
Out[37]: <Dual: 11.000000, (x, y), [4.0, 6.0]>

In [38]: np.matmul(x[:, np.newaxis], y[np.newaxis, :])
Out[38]: 
array([[<Dual: 3.000000, (x), [4.0]>,
        <Dual: 4.000000, (x, y), [4.0, 1.0]>],
       [<Dual: 6.000000, (y, x), [3.0, 2.0]>,
        <Dual: 8.000000, (y), [6.0]>]], dtype=object)

In [39]: np.tensordot(x[np.newaxis, :, np.newaxis], y[np.newaxis, :], (1, 1))
Out[39]: array([[[<Dual: 11.000000, (x, y), [4.0, 6.0]>]]], dtype=object)

Solving the linear system, \(Ax=b\), is not not directly possible from NumPy, thus a custom solver, dual_solve(), has been implemented using the Doolittle algorithm with partial pivoting.

In [40]: A = np.array([
   ....:     [1, 0],
   ....:     [Dual(2, ["z"], []), 1]
   ....: ], dtype="object")
   ....: 

In [41]: b = np.array([Dual(2, ["y"], []), Dual(5, ["x", "y"], [])])[:, np.newaxis]

In [42]: x = dual_solve(A, b)

In [43]: x
Out[43]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 1.000000, (z, x, y), [-2.0, 1.0, -1.0]>]], dtype=object)

In [44]: np.matmul(A, x)
Out[44]: 
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
       [<Dual: 5.000000, (z, x, y), [0.0, 1.0, 1.0]>]], dtype=object)

Exogenous Variables#

The Variable class allows users to inject sensitivity into calculations without knowing which AD order is required for calculations - calculations will not raise TypeErrors. Upon first instance of a binary operation with an object of either order it will return an object of associated type.

In [45]: x = Variable(2.5, ["x"])

In [46]: x * Dual(1.5, ["y"], [2.2])
Out[46]: <Dual: 3.750000, (x, y), [1.5, 5.5]>

In [47]: x * Dual2(1.5, ["y"], [2.2], [])
Out[47]: <Dual2: 3.750000, (x, y), [1.5, 5.5], [[...]]>

In order for other internal processes to function dynamically, rateslib maintains a global AD order. When a Variable performs a self operation from which it cannot infer the AD order, it will refer to this global state.

In [48]: defaults._global_ad_order = 1

In [49]: dual_exp(x)
Out[49]: <Dual: 12.182494, (x), [12.2]>

In [50]: defaults._global_ad_order = 2

In [51]: dual_exp(x)
Out[51]: <Dual2: 12.182494, (x), [12.2], [[...]]>

In [52]: defaults._global_ad_order = 1  # Reset

Product Rule and `keep_manifold`#

The keep_manifold argument is also exclusively available for Dual2. When extracting a first order gradient from a Dual2 there is information available to also express the gradient of this gradient.

Consider the function \(g(x)=x^2\) at \(x=2\). This is the object:

In [53]: g = Dual2(4.0, ["x"], [4.0], [1.0])

The default, first order, gradient extraction will simply yield floating point values:

In [54]: gradient(g, order=1)
Out[54]: array([4.])

However, by directly requesting to keep_manifold then the output will yield a Dual2 datatype of this information with the appropriate sensitivity of this gradient derived from second order.

In [55]: gradient(g, order=1, keep_manifold=True)
Out[55]: array([<Dual2: 4.000000, (x), [2.0], [[...]]>], dtype=object)

This is not used frequently within rateslib but its purpose is to preserve the chain rule, and allow two chained operations of the gradient() method.

Consider the additional function \(h(y)=y^2\), evaluated at the point \(y=4\). This is the object:

In [56]: h = Dual2(16.0, ["y"], [8.0], [1.0])

It is natural to derive second order gradients of the product of these functions at the supposed points using:

In [57]: gradient(g * h, order=2)
Out[57]: 
array([[32., 32.],
       [32.,  8.]])

Below demonstrates the dual application of the method to derive the same values.

In [58]: gradient(gradient(g * h, keep_manifold=True)[0])
Out[58]: array([32., 32.])

In [59]: gradient(gradient(g * h, keep_manifold=True)[1])
Out[59]: array([32.,  8.])