Kernels¶
George comes equipped with a suite of standard covariance functions or kernels that can be combined to build more complex models. The standard kernels fall into the following categories:
- Stationary kernels — functions that depend only on the radial distance between points in some user-defined metric, and
- Non-stationary kernels — functions that depend on the value of the input coordinates themselves.
Combining kernels describes how to combine kernels to build more sophisticated models and Implementing new kernels explains how you would go about incorporating a custom kernel.
Common parameters¶
Every kernel accepts the two keyword arguments ndim
and axes
. By
default, kernels are only one dimensional so you must specify the ndim
argument if you want the kernel to work with higher dimensional inputs.
By default, higher dimensional kernels are applied to every dimension but you
can restrict the evaluation to a subspace using the axes
argument.
For example, if you have a 3 dimensional input space but you want one of the
kernels to only act in the first dimension, you would do the following:
from george import kernels
kernel = 10.0 * kernels.Matern32Kernel(1.0, ndim=3, axes=0)
Similarly, if you wanted the kernel to act on only the second and third dimensions, you could do something like:
kernel = 10.0 * kernels.ExpSquaredKernel([1.0, 0.5], ndim=3, axes=[1, 2])
Finally, all of the stationary kernels can be “bounded”. This means that the
kernel will only be applied within some parameter range. In practice, the
covariance matrix will have a block diagonal structure. To use this feature,
you use the bounds
keyword argument:
kernel = 10.0 * kernels.ExpSquaredKernel(1.0, bounds=(-1.0, 1.0))
# or...
kernel = kernels.ExpSquaredKernel([1.0, 1.5], ndim=2,
bounds=[(-1.0, 1.0), (0.5, 1.5)])
Implementation details & modeling interface¶
It’s worth understanding how these kernels are implemented. Most of the hard work is done at a low level (in C++) and the Python is only a thin wrapper to this functionality. This makes the code fast and consistent across interfaces but it also means that it isn’t currently possible to implement new kernel functions without recompiling the code. Almost every kernel has hyperparameters that you can set to control its behavior and these are controlled using the Modeling protocol.
k = 2.0 * kernels.Matern32Kernel(5.0)
print(k.get_parameter_names())
# ['k1:ln_constant', 'k2:ln_M_0_0']
print(k.get_vector())
# [ 0.69314718 1.60943791]
You’ll notice that, in this case, the parameter vector is the logarithm of the parameters given when building the kernel. This will be the case for any strictly positive parameters because it is always better to fit in the logarithm of these types of parameters. You probably also noticed that the parameters have names. This opens up a few interesting features. For example, if you want to change any of the parameters, you can do it as follows:
import numpy as np
k["k1:ln_constant"] = np.log(10.0)
print(k.get_vector())
# [ 2.30258509 1.60943791]
# ... or:
k[0] = np.log(2.0)
print(k.get_vector())
# [ 0.69314718 1.60943791]
Finally, if you want to update the entire vector, you can use the
set_vector()
method:
k.set_vector(k.get_vector() + np.random.randn(2))
Another feature common to the kernels is that you can “freeze” and “thaw” parameters by name. For example, let’s say that you want to keep the amplitude of your kernel fixed and fit for only the scale length:
k = 2.0 * kernels.Matern32Kernel(5.0)
k.freeze_parameter("k1:ln_constant")
print(k.get_parameter_names())
# ['k2:ln_M_0_0']
print(k.get_vector())
# [ 1.60943791]
Bringing a parameter back into the fold is as easy as
k.thaw_parameter("k1:ln_constant")
print(k.get_parameter_names())
# ['k1:ln_constant', 'k2:ln_M_0_0']
print(k.get_vector())
# [ 0.69314718 1.60943791]
Stationary kernels¶
Stationary kernels are a class of functions that depend on the input coordinates \(\mathbf{x}_i\) and \(\mathbf{x}_j\) through their squared distance under some metric \(C\):
The currently supported metrics are:
- “isotropic” — the scale length is equal in all dimensions,
- “axis-aligned” — there is a different scale length in each dimension, and
- “general” — arbitrary covariances between dimensions are allowed.
The “isotropic” and “axis-aligned” metrics are parameterized by the logarithms of their scale lengths. For example:
from george.metrics import Metric
m = Metric(2.0, ndim=2)
print(m.get_vector())
# [ 0.69314718]
gives a two-dimensional isotropic metric with
and
m = Metric([2.0, 4.0], ndim=2)
print(m.get_vector())
# [ 0.69314718 1.38629436]
specifies the following matrix
In the “general” case, the matrix is parameterized by the elements of the Cholesky decomposition \(C = L\,L^\mathrm{T}\) with logarithms along the diagonal. For example:
m = Metric([[2.0, 0.1], [0.1, 4.0]], ndim=2)
print(m.get_vector())
# [ 0.34657359 0.07071068 0.69252179]
All the stationary kernels take the metric
specification as a keyword
argument:
k = kernels.ExpSquaredKernel(metric=[[5.0, 0.1], [0.1, 4.0]], ndim=2)
print(k.get_vector())
# [ 0.80471896 0.04472136 0.69289712]
The currently available stationary kernels are:
-
class
george.kernels.
ExpSquaredKernel
(metric=None, lower=True, bounds=None, ndim=1, axes=None)¶ The exponential-squared kernel is a stationary kernel where the value at a given radius \(r^2\) is given by:
\[k(r^2) = \exp \left ( -\frac{r^2}{2} \right )\]
-
class
george.kernels.
Matern32Kernel
(metric=None, lower=True, bounds=None, ndim=1, axes=None)¶ The Matern-3/2 kernel is stationary kernel where the value at a given radius \(r^2\) is given by:
\[k(r^2) = \left( 1+\sqrt{3\,r^2} \right)\, \exp \left (-\sqrt{3\,r^2} \right )\]
-
class
george.kernels.
Matern52Kernel
(metric=None, lower=True, bounds=None, ndim=1, axes=None)¶ The Matern-5/2 kernel is stationary kernel where the value at a given radius \(r^2\) is given by:
\[k(r^2) = \left( 1+\sqrt{5\,r^2}+ \frac{5\,r^2}{3} \right)\, \exp \left (-\sqrt{5\,r^2} \right )\]
-
class
george.kernels.
ExpKernel
(metric=None, lower=True, bounds=None, ndim=1, axes=None)¶ The exponential-squared kernel is a stationary kernel where the value at a given radius \(r^2\) is given by:
\[k(r^2) = \exp \left ( -\sqrt{r^2} \right )\]
-
class
george.kernels.
RationalQuadraticKernel
(ln_alpha=None, alpha=None, metric=None, lower=True, bounds=None, ndim=1, axes=None)¶ This is equivalent to a “scale mixture” of
ExpSquaredKernel
kernels with different scale lengths drawn from a gamma distribution. See R&W for more info but here’s the equation:\[k(r^2) = \left[1 - \frac{r^2}{2\,\alpha} \right]^\alpha\]Parameters: alpha – The Gamma distribution parameter.
Non-stationary kernels¶
Non-stationary kernels are specified by a (symmetric) function of the input
coordinates themselves.
They are applied identically to every axis so the axes
keyword argument
will probably come in handy.
For example, to implement a quasi-periodic kernel with a three-dimensional input space where you only want to apply the periodicity along the first (e.g. time) dimension, you would use something like:
k = kernels.ExpSine2Kernel(gamma=0.1, period=5.0, ndim=3, axes=0)
k *= 10.0 * kernels.ExpSquaredKernel(metric=5.0, ndim=3, axes=0)
k += 4.0 * kernels.Matern32Kernel(metric=4.0, ndim=3, axes=[1, 2])
The currently available non-stationary kernels are:
-
class
george.kernels.
ConstantKernel
(ln_constant=None, constant=None, ndim=1, axes=None)¶ This kernel returns the constant
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = c\]where \(c\) is a parameter.
Parameters: constant – The constant value \(c\) in the above equation.
-
class
george.kernels.
DotProductKernel
(ndim=1, axes=None)¶ The dot product kernel
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \mathbf{x}_i \cdot \mathbf{x}_j\]with no parameters.
-
class
george.kernels.
LinearKernel
(ln_gamma2=None, gamma2=None, order=None, ndim=1, axes=None)¶ The linear regression kernel
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \frac{(\mathbf{x}_i \cdot \mathbf{x}_j)^P}{\gamma^2}\]Parameters: - order – The power \(P\). This parameter is a constant; it is not included in the parameter vector.
- gamma2 – The scale factor \(\gamma^2\).
-
class
george.kernels.
PolynomialKernel
(ln_sigma2=None, sigma2=None, order=None, ndim=1, axes=None)¶ A polynomial kernel
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = (\mathbf{x}_i \cdot \mathbf{x}_j + \sigma^2)^P\]Parameters: - order – The power \(P\). This parameter is a constant; it is not included in the parameter vector.
- sigma2 – The variance \(\sigma^2 > 0\).
-
class
george.kernels.
CosineKernel
(ln_period=None, period=None, ndim=1, axes=None)¶ The simplest periodic kernel. This
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \cos\left( \frac{2\,\pi\,|x_i - x_j|}{P} \right)\]where the parameter \(P\) is the period of the oscillation. This kernel should probably always be multiplied be a stationary kernel (e.g.
ExpSquaredKernel
) to allow quasi-periodic variations.Parameters: period – The period of the oscillation.
-
class
george.kernels.
ExpSine2Kernel
(gamma=None, ln_period=None, period=None, ndim=1, axes=None)¶ The exp-sine-squared kernel is a commonly used periodic kernel. Unlike the
CosineKernel
, this kernel never has negative covariance which might be useful for your problem. Here’s the equation:\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \exp \left( -\Gamma\,\sin^2\left[ \frac{\pi}{P}\,\left|x_i-x_j\right| \right] \right)\]Parameters: - gamma – The scale \(\Gamma\) of the correlations.
- period – The period \(P\) of the oscillation (in the same units as \(\mathbf{x}\)).
-
class
george.kernels.
LocalGaussianKernel
(location=None, ln_width=None, width=None, ndim=1, axes=None)¶ A local Gaussian kernel.
\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \exp\left( -\frac{(x_i - x_0)^2 + (x_j - x_0)^2}{2\,w} \right))\]Parameters: - location – The location \(x_0\) of the Gaussian.
- width – The (squared) width \(w\) of the Gaussian.
Combining kernels¶
More complicated kernels can be constructed by algebraically combining the basic kernels listed in the previous sections. In particular, all the kernels support addition and multiplication. For example, an exponential-squared kernel with a non-trivial variance can be constructed as follows:
from george import kernels
kernel = 1e-3 * kernels.ExpSquaredKernel(3.4)
This is equivalent to:
from math import sqrt
kernel = kernels.Product(kernels.ConstantKernel(constant=sqrt(1e-3)),
kernels.ExpSquaredKernel(3.4))
As demonstrated in Hyperparameter optimization, a mixture of kernels can be implemented with addition:
k1 = 1e-3 * kernels.ExpSquaredKernel(3.4)
k2 = 1e-4 * kernels.Matern32Kernel(14.53)
kernel = k1 + k2
Implementing new kernels¶
As mentioned previously, because of technical limitations, new kernels can only be implemented by re-compiling george. See Implementing new kernels for a detailed example of implementing a new kernel.