Dual Numbers#
The rateslib.dual module creates dual number datatypes that provide this library
with the ability to perform automatic
differentiation (AD). The implementation style here is known as forward mode, as
opposed to reverse mode (otherwise called automatic adjoint differentiation or
back propagation).
Summary#
Classes#
|
Dual number data type to perform first derivative automatic differentiation. |
|
Dual number data type to perform second derivative automatic differentiation. |
|
A user defined, exogenous variable that automatically converts to a |
Methods#
|
Return derivatives of a dual number. |
Calculate the exponential value of a regular int or float or a dual number. |
|
|
Calculate the logarithm of a regular int or float or a dual number. |
Return the standard normal probability density function. |
|
Return the cumulative standard normal distribution for given value. |
|
Return the inverse cumulative standard normal distribution for given value. |
|
|
Solve a linear system of equations involving dual number data types. |
Example#
First Derivatives#
In [1]: from rateslib.dual import *
In [2]: def func(x, y, z):
...: return x**6 + dual_exp(x/y) + dual_log(z)
...:
In [3]: func(2, 1, 2)
Out[3]: 72.0822032794906
For extracting only first derivatives it is more efficient
to use Dual:
In [4]: x, y, z = Dual(2, ["x"], []), Dual(1, ["y"], []), Dual(2, ["z"], [])
In [5]: func(x, y, z)
Out[5]: <Dual: 72.082203, (x, y, z), [199.4, -14.8, 0.5]>
In [6]: gradient(func(x, y, z), ["x", "y", "z"])
Out[6]: array([199.3890561, -14.7781122, 0.5 ])
Second Derivatives#
For extracting second derivatives we must use Dual2:
In [7]: x, y, z = Dual2(2, ["x"], [], []), Dual2(1, ["y"], [], []), Dual2(2, ["z"], [], [])
In [8]: func(x, y, z)
Out[8]: <Dual2: 72.082203, (x, y, z), [199.4, -14.8, 0.5], [[...]]>
In [9]: gradient(func(x, y, z), ["x", "y", "z"])
Out[9]: array([199.3890561, -14.7781122, 0.5 ])
In [10]: gradient(func(x, y, z), ["x", "y"], order=2)
Out[10]:
array([[487.3890561 , -22.1671683 ],
[-22.1671683 , 59.11244879]])
The keep_manifold argument is also exclusively available
for Dual2. When
extracting a first order gradient from a Dual2 this is
will use information about
second order and transfer it to first order thus representing a linear manifold
of the gradient. This is useful for allowing composited automatic calculation of
second order gradients. For example
consider the following functions, \(g(x)=x^2\) and \(h(y)=y^2\), evaluated at
the points \(x=2\) and \(y=4\). This creates the quadratic manifolds centered
at those points expressed in the following Dual2 numbers:
In [11]: g = Dual2(4, ["x"], [4], [1]) # g(x=2)
In [12]: h = Dual2(16, ["y"], [8], [1]) # h(y=4)
If we wish to multiply these two functions and evaluate the second order derivatives at (2, 4) we can simply do,
In [13]: gradient(g*h, order=2)
Out[13]:
array([[32., 32.],
[32., 8.]])
And observe that, say, \(\frac{\partial (gh)}{\partial x \partial y} = 4xy|_{(2, 4)} = 32\), as shown in the above array.
But, we can also use the product rule of differentiation to assert that,
which we express in our dual language as,
In [14]: gradient(g, ["x", "y"], keep_manifold=True) * h + g * gradient(h, ["x", "y"], keep_manifold=True)
Out[14]:
array([<Dual2: 64.000000, (x, y), [36.0, 36.0], [[...]]>,
<Dual2: 32.000000, (x, y), [48.0, 24.0], [[...]]>], dtype=object)
If the manifold is not maintained the product rule fails because information that is required to ultimately determine that desired second derivative is discarded.
In [15]: gradient(g, ["x", "y"]) * h + g * gradient(h, ["x", "y"])
Out[15]:
array([<Dual2: 64.000000, (y, x), [32.0, 0.0], [[...]]>,
<Dual2: 32.000000, (y, x), [0.0, 32.0], [[...]]>], dtype=object)
More specifically,
In [16]: gradient(g, ["x", "y"], keep_manifold=True)
Out[16]:
array([<Dual2: 4.000000, (x, y), [2.0, 0.0], [[...]]>,
<Dual2: 0.000000, (x, y), [1.0, 1.0], [[...]]>], dtype=object)
while,
In [17]: gradient(g, ["x", "y"])
Out[17]: array([4., 0.])
Implementation#
Forward mode AD is implemented using operating overloading and custom compatible functions. The operations implemented are;
addition (+),
subtraction and negation (-),
multiplication (*),
division and inversion (/) (**-1),
n’th power where n is an integer or a float (**n),
exponential and logarithms (which require the specific methods below),
equality of dual numbers with integers and floats and with each other.
Compatability with NumPy#
To enable this library to perform its calculations in a vectorised way we need to
leverage NumPy’s array calculations. NumPy arrays containing dual numbers are
forced to have an object dtype configuration. This is imposed by NumPy and means
that certain functions may not be compatible, for example np.einsum (although,
support for object dtypes was added to np.einsum as of version 1.25.0).
However, many functions are compatible.
Broadcasting#
Operations of Dual and Dual2
with int and float dtypes permit the NumPy versions; np.int8, np.int16,
np.int32, np.int64, np.float16, np.float32, np.float64, and np.float128.
Broadcasting of arrays has been implemented so that the following
operations work as expected.
In [18]: np_arr = np.array([1, 2])
In [19]: Dual(3, ["x"], []) * np_arr
Out[19]:
array([<Dual: 3.000000, (x), [1.0]>, <Dual: 6.000000, (x), [2.0]>],
dtype=object)
In [20]: np_arr / Dual(4, ["y"], [])
Out[20]:
array([<Dual: 0.250000, (y), [-0.1]>, <Dual: 0.500000, (y), [-0.1]>],
dtype=object)
In [21]: Dual(4, ["x"], []) ** np_arr
Out[21]:
array([<Dual: 4.000000, (x), [1.0]>, <Dual: 16.000000, (x), [8.0]>],
dtype=object)
Elementwise Operations#
Simple operations on tensors also work as expected.
In [22]: x = np.array([Dual(1, ["x"], []), Dual(2, ["y"], [])])
In [23]: y = np.array([Dual(3, ["x"], []), Dual(4, ["y"], [])])
In [24]: x + y
Out[24]:
array([<Dual: 4.000000, (x), [2.0]>, <Dual: 6.000000, (y), [2.0]>],
dtype=object)
In [25]: x * y
Out[25]:
array([<Dual: 3.000000, (x), [4.0]>, <Dual: 8.000000, (y), [6.0]>],
dtype=object)
In [26]: x / y
Out[26]:
array([<Dual: 0.333333, (x), [0.2]>, <Dual: 0.500000, (y), [0.1]>],
dtype=object)
Linear Algebra#
Common linear algebraic operations are also available, such as:
np.matmul
np.inner
np.dot
np.tensordot
In [27]: np.dot(x, y)
Out[27]: <Dual: 11.000000, (x, y), [4.0, 6.0]>
In [28]: np.inner(x, y)
Out[28]: <Dual: 11.000000, (x, y), [4.0, 6.0]>
In [29]: np.matmul(x[:, np.newaxis], y[np.newaxis, :])
Out[29]:
array([[<Dual: 3.000000, (x), [4.0]>,
<Dual: 4.000000, (x, y), [4.0, 1.0]>],
[<Dual: 6.000000, (y, x), [3.0, 2.0]>,
<Dual: 8.000000, (y), [6.0]>]], dtype=object)
In [30]: np.tensordot(x[np.newaxis, :, np.newaxis], y[np.newaxis, :], (1, 1))
Out[30]: array([[[<Dual: 11.000000, (x, y), [4.0, 6.0]>]]], dtype=object)
Solving the linear system, \(Ax=b\), is not not directly possible from NumPy,
thus a custom solver, dual_solve(), has been implemented
using the Doolittle algorithm with partial pivoting.
In [31]: A = np.array([
....: [1, 0],
....: [Dual(2, ["z"], []), 1]
....: ], dtype="object")
....:
In [32]: b = np.array([Dual(2, ["y"], []), Dual(5, ["x", "y"], [])])[:, np.newaxis]
In [33]: x = dual_solve(A, b)
In [34]: x
Out[34]:
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
[<Dual: 1.000000, (z, x, y), [-2.0, 1.0, -1.0]>]], dtype=object)
In [35]: np.matmul(A, x)
Out[35]:
array([[<Dual: 2.000000, (z, x, y), [0.0, 0.0, 1.0]>],
[<Dual: 5.000000, (z, x, y), [0.0, 1.0, 1.0]>]], dtype=object)
Exogenous Variables#
The Variable class allows users to inject sensitivity into calculations
without knowing which AD order is required for calculations - calculations will not
raise TypeErrors. Upon first instance of a binary operation
with an object of either order it will return an object of associated type.
In [36]: x = Variable(2.5, ["x"])
In [37]: x * Dual(1.5, ["y"], [2.2])
Out[37]: <Dual: 3.750000, (x, y), [1.5, 5.5]>
In [38]: x * Dual2(1.5, ["y"], [2.2], [])
Out[38]: <Dual2: 3.750000, (x, y), [1.5, 5.5], [[...]]>
In order for other internal processes to function dynamically, rateslib maintains a global AD order. When a Variable performs a self operation from which it cannot infer the AD order, it will refer to this global state.
In [39]: defaults._global_ad_order = 1
In [40]: dual_exp(x)
Out[40]: <Dual: 12.182494, (x), [12.2]>
In [41]: defaults._global_ad_order = 2
In [42]: dual_exp(x)
Out[42]: <Dual2: 12.182494, (x), [12.2], [[...]]>
In [43]: defaults._global_ad_order = 1 # Reset