GAM¶

class pygam.pygam.GAM(terms='auto', max_iter=100, tol=0.0001, distribution='normal', link='identity', callbacks=['deviance', 'diffs'], fit_intercept=True, verbose=False, **kwargs)¶

Bases: pygam.core.Core, pygam.terms.MetaTermMixin

Generalized Additive Model

Parameters:

terms (expression specifying terms to model, optional.) –
By default a univariate spline term will be allocated for each feature.

For example:
```
>>> GAM(s(0) + l(1) + f(2) + te(3, 4))
```
will fit a spline term on feature 0, a linear term on feature 1, a factor term on feature 2, and a tensor term on features 3 and 4.
callbacks (list of str or list of CallBack objects, optional) – Names of callback objects to call during the optimization loop.
distribution (str or Distribution object, optional) – Distribution to use in the model.
link (str or Link object, optional) – Link function to use in the model.
fit_intercept (bool, optional) – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. Note: the intercept receives no smoothing penalty.
max_iter (int, optional) – Maximum number of iterations allowed for the solver to converge.
tol (float, optional) – Tolerance for stopping criteria.
verbose (bool, optional) – whether to show pyGAM warnings.

coef_¶

Coefficient of the features in the decision function. If fit_intercept is True, then self.coef_[0] will contain the bias.

Type:	array, shape (n_classes, m_features)

statistics_¶

Dictionary containing model statistics like GCV/UBRE scores, AIC/c, parameter covariances, estimated degrees of freedom, etc.

Type:	dict

logs_¶

Dictionary containing the outputs of any callbacks at each optimization loop.

The logs are structured as {callback: [...]}

Type:	dict

References

Simon N. Wood, 2006 Generalized Additive Models: an introduction with R

Hastie, Tibshirani, Friedman The Elements of Statistical Learning http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

Paul Eilers & Brian Marx, 2015 International Biometric Society: A Crash Course on P-splines http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf

confidence_intervals(X, width=0.95, quantiles=None)¶

estimate confidence intervals for the model.

Parameters:	X (array-like of shape (n_samples, m_features)) – Input data matrix width (float on [0,1], optional) – quantiles (array-like of floats in (0, 1), optional) – Instead of specifying the prediciton width, one can specify the quantiles. So `width=.95` is equivalent to `quantiles=[.025, .975]`
Returns:	intervals
Return type:	np.array of shape (n_samples, 2 or len(quantiles))

Notes

Wood 2006, section 4.9: Confidence intervals based on section 4.8 rely on large sample results to deal with non-Gaussian distributions, and treat the smoothing parameters as fixed, when in reality they are estimated from the data.

deviance_residuals(X, y, weights=None, scaled=False)¶

method to compute the deviance residuals of the model

these are analogous to the residuals of an OLS.

Parameters:	X (array-like) – Input data array of shape (n_saples, m_features) y (array-like) – Output data vector of shape (n_samples,) weights (array-like shape (n_samples,) or None, optional) – Sample weights. if None, defaults to array of ones scaled (bool, optional) – whether to scale the deviance by the (estimated) distribution scale
Returns:	deviance_residuals – with shape (n_samples,)
Return type:	np.array

fit(X, y, weights=None)¶

Fit the generalized additive model.

Parameters:	X (array-like, shape (n_samples, m_features)) – Training vectors. y (array-like, shape (n_samples,)) – Target values, ie integers in classification, real numbers in regression) weights (array-like shape (n_samples,) or None, optional) – Sample weights. if None, defaults to array of ones
Returns:	self – Returns fitted GAM object
Return type:	object

generate_X_grid(term, n=100, meshgrid=False)¶

create a nice grid of X data

array is sorted by feature and uniformly spaced, so the marginal and joint distributions are likely wrong

if term is >= 0, we generate n samples per feature, which results in n^deg samples, where deg is the degree of the interaction of the term

Parameters:

term (int,) – Which term to process.
n (int, optional) – number of data points to create
meshgrid (bool, optional) – Whether to return a meshgrid (useful for 3d plotting) or a feature matrix (useful for inference like partial predictions)

Returns:

if meshgrid is False – np.array of shape (n, n_features) where m is the number of (sub)terms in the requested (tensor)term.
else – tuple of len m, where m is the number of (sub)terms in the requested (tensor)term.

each element in the tuple contains a np.ndarray of size (n)^m

Raises:

ValueError : – If the term requested is an intercept since it does not make sense to process the intercept term.

gridsearch(X, y, weights=None, return_scores=False, keep_best=True, objective='auto', progress=True, **param_grids)¶

Performs a grid search over a space of parameters for a given objective

Warning

gridsearch is lazy and will not remove useless combinations from the search space, eg.

>>> n_splines=np.arange(5,10), fit_splines=[True, False]

will result in 10 loops, of which 5 are equivalent because fit_splines = False

Also, it is not recommended to search over a grid that alternates between known scales and unknown scales, as the scores of the candidate models will not be comparable.

Parameters:

X (array-like) – input data of shape (n_samples, m_features)
y (array-like) – label data of shape (n_samples,)
weights (array-like shape (n_samples,), optional) – sample weights
return_scores (boolean, optional) – whether to return the hyperpamaters and score for each element in the grid
keep_best (boolean, optional) – whether to keep the best GAM as self.
objective ({'auto', 'AIC', 'AICc', 'GCV', 'UBRE'}, optional) – Metric to optimize. If auto, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.
progress (bool, optional) – whether to display a progress bar
**kwargs –
pairs of parameters and iterables of floats, or parameters and iterables of iterables of floats.

If no parameter are specified, lam=np.logspace(-3, 3, 11) is used. This results in a 11 points, placed diagonally across lam space.

If grid is iterable of iterables of floats, the outer iterable must have length m_features. the cartesian product of the subgrids in the grid will be tested.

If grid is a 2d numpy array, each row of the array will be tested.

The method will make a grid of all the combinations of the parameters and fit a GAM to each combination.

Returns:

if return_scores=True – model_scores: dict containing each fitted model as keys and corresponding objective scores as values
else – self: ie possibly the newly fitted model

Examples

For a model with 4 terms, and where we expect 4 lam values, our search space for lam must have 4 dimensions.

We can search the space in 3 ways:

1. via cartesian product by specifying the grid as a list. our grid search will consider 11 ** 4 points:

>>> lam = np.logspace(-3, 3, 11)
>>> lams = [lam] * 4
>>> gam.gridsearch(X, y, lam=lams)

2. directly by specifying the grid as a np.ndarray. This is useful for when the dimensionality of the search space is very large, and we would prefer to execute a randomized search:

>>> lams = np.exp(np.random.random(50, 4) * 6 - 3)
>>> gam.gridsearch(X, y, lam=lams)

3. copying grids for parameters with multiple dimensions. if we specify a 1D np.ndarray for lam, we are implicitly testing the space where all points have the same value

>>> gam.gridsearch(lam=np.logspace(-3, 3, 11))

is equivalent to:

>>> lam = np.logspace(-3, 3, 11)
>>> lams = np.array([lam] * 4)
>>> gam.gridsearch(X, y, lam=lams)

loglikelihood(X, y, weights=None)¶

compute the log-likelihood of the dataset using the current model

Parameters:	X (array-like of shape (n_samples, m_features)) – containing the input dataset y (array-like of shape (n,)) – containing target values weights (array-like of shape (n,), optional) – containing sample weights
Returns:	log-likelihood – containing log-likelihood scores
Return type:	np.array of shape (n,)

partial_dependence(term, X=None, width=None, quantiles=None, meshgrid=False)¶

Computes the term functions for the GAM and possibly their confidence intervals.

if both width=None and quantiles=None, then no confidence intervals are computed

Parameters:

term (int, optional) – Term for which to compute the partial dependence functions.
X (array-like with input data, optional) –
if meshgrid=False, then X should be an array-like of shape (n_samples, m_features).

if meshgrid=True, then X should be a tuple containing an array for each feature in the term.

if None, an equally spaced grid of points is generated.
width (float on (0, 1), optional) – Width of the confidence interval.
quantiles (array-like of floats on (0, 1), optional) – instead of specifying the prediciton width, one can specify the quantiles. so width=.95 is equivalent to quantiles=[.025, .975]. if None, defaults to width.
meshgrid (bool, whether to return and accept meshgrids.) –
Useful for creating outputs that are suitable for 3D plotting.

Note, for simple terms with no interactions, the output of this function will be the same for meshgrid=True and meshgrid=False, but the inputs will need to be different.

Returns:

pdeps (np.array of shape (n_samples,))
conf_intervals (list of length len(term)) – containing np.arrays of shape (n_samples, 2 or len(quantiles))

Raises:

ValueError : – If the term requested is an intercept since it does not make sense to process the intercept term.

See also

generate_X_grid(): for help creating meshgrids.

predict(X)¶

preduct expected value of target given model and input X often this is done via expected value of GAM given input X

Parameters:	X (array-like of shape (n_samples, m_features)) – containing the input dataset
Returns:	y – containing predicted values under the model
Return type:	np.array of shape (n_samples,)

predict_mu(X)¶

preduct expected value of target given model and input X

Parameters:	X (array-like of shape (n_samples, m_features),) – containing the input dataset
Returns:	y – containing expected values under the model
Return type:	np.array of shape (n_samples,)

sample(X, y, quantity='y', sample_at_X=None, weights=None, n_draws=100, n_bootstraps=5, objective='auto')¶

Simulate from the posterior of the coefficients and smoothing params.

Samples are drawn from the posterior of the coefficients and smoothing parameters given the response in an approximate way. The GAM must already be fitted before calling this method; if the model has not been fitted, then an exception is raised. Moreover, it is recommended that the model and its hyperparameters be chosen with gridsearch (with the parameter keep_best=True) before calling sample, so that the result of that gridsearch can be used to generate useful response data and so that the model’s coefficients (and their covariance matrix) can be used as the first bootstrap sample.

These samples are drawn as follows. Details are in the reference below.

1. n_bootstraps many “bootstrap samples” of the response (y) are simulated by drawing random samples from the model’s distribution evaluated at the expected values (mu) for each sample in X.

2. A copy of the model is fitted to each of those bootstrap samples of the response. The result is an approximation of the distribution over the smoothing parameter lam given the response data y.

3. Samples of the coefficients are simulated from a multivariate normal using the bootstrap samples of the coefficients and their covariance matrices.

Notes

A gridsearch is done n_bootstraps many times, so keep n_bootstraps small. Make n_bootstraps < n_draws to take advantage of the expensive bootstrap samples of the smoothing parameters.

Parameters:

X (array of shape (n_samples, m_features)) – empirical input data
y (array of shape (n_samples,)) – empirical response vector
quantity ({'y', 'coef', 'mu'}, default: 'y') – What quantity to return pseudorandom samples of. If sample_at_X is not None and quantity is either ‘y’ or ‘mu’, then samples are drawn at the values of X specified in sample_at_X.
sample_at_X (array of shape (n_samples_to_simulate, m_features) or) –
optional (None,) –
Input data at which to draw new samples.

Only applies for quantity equal to ‘y’ or to ‘mu’. If None, then sample_at_X is replaced by X.
weights (np.array of shape (n_samples,)) – sample weights
n_draws (positive int, optional (default=100)) – The number of samples to draw from the posterior distribution of the coefficients and smoothing parameters
n_bootstraps (positive int, optional (default=5)) – The number of bootstrap samples to draw from simulations of the response (from the already fitted model) to estimate the distribution of the smoothing parameters given the response data. If n_bootstraps is 1, then only the already fitted model’s smoothing parameter is used, and the distribution over the smoothing parameters is not estimated using bootstrap sampling.
objective (string, optional (default='auto') – metric to optimize in grid search. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.

Returns:

draws – Simulations of the given quantity using samples from the posterior distribution of the coefficients and smoothing parameter given the response data. Each row is a pseudorandom sample.

If quantity == ‘coef’, then the number of columns of draws is the number of coefficients (len(self.coef_)).

Otherwise, the number of columns of draws is the number of rows of sample_at_X if sample_at_X is not None or else the number of rows of X.

Return type:

2D array of length n_draws

References

Simon N. Wood, 2006. Generalized Additive Models: an introduction with R. Section 4.9.3 (pages 198–199) and Section 5.4.2 (page 256–257).

summary()¶

produce a summary of the model statistics

Parameters:	None –
Returns:
Return type:	None