PoissonGAM#

class pygam.pygam.PoissonGAM(terms='auto', max_iter=100, tol=0.0001, callbacks=['deviance', 'diffs'], fit_intercept=True, verbose=False, **kwargs)[source]#

Bases: GAM

Poisson GAM.

This is a GAM with a Poisson error distribution, and a log link.

Parameters:

termsexpression specifying terms to model, optional.

By default a univariate spline term will be allocated for each feature.

For example:

>>> GAM(s(0) + l(1) + f(2) + te(3, 4))

will fit a spline term on feature 0, a linear term on feature 1, a factor term on feature 2, and a tensor term on features 3 and 4.

callbackslist of str or list of CallBack objects, optional

Names of callback objects to call during the optimization loop.

fit_interceptbool, optional

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. Note: the intercept receives no smoothing penalty.

max_iterint, optional

Maximum number of iterations allowed for the solver to converge.

tolfloat, optional

Tolerance for stopping criteria.

verbosebool, optional

whether to show pyGAM warnings.

Attributes:

coef_array, shape (n_classes, m_features)

Coefficient of the features in the decision function. If fit_intercept is True, then self.coef_[-1] will contain the bias.

statistics_dict

Dictionary containing model statistics like GCV/UBRE scores, AIC/c, parameter covariances, estimated degrees of freedom, etc.

logs_dict

Dictionary containing the outputs of any callbacks at each optimization loop.

The logs are structured as {callback: [...]}

Methods

`confidence_intervals`(X[, width, quantiles])	Estimate confidence intervals for the model.
`deviance_residuals`(X, y[, weights, scaled])	Method to compute the deviance residuals of the model.
`fit`(X, y[, exposure, weights])	Fit the generalized additive model.
`generate_X_grid`(term[, n, meshgrid])	Create a nice grid of X data.
`get_params`([deep])	Returns a dict of all of the object's user-facing parameters.
`gridsearch`(X, y[, exposure, weights, ...])	Performs a grid search over a space of parameters for a given objective.
`loglikelihood`(X, y[, exposure, weights])	Compute the log-likelihood of the dataset using the current model.
`partial_dependence`(term[, X, width, ...])	Computes the term functions for the GAM and possibly their confidence intervals.
`predict`(X[, exposure])	Predict expected value of target given model and input X often this is done via expected value of GAM given input X.
`predict_mu`(X)	Predict expected value of target given model and input X
`sample`(X, y[, quantity, sample_at_X, ...])	Simulate from the posterior of the coefficients and smoothing params.
`score`(X, y[, weights])	Compute the explained deviance for a trained model for a given X data and y labels.
`set_params`([deep, force])	Sets an object's parameters.
`summary`()	Produce a summary of the model statistics.

References

Simon N. Wood, 2006
Generalized Additive Models: an introduction with R

Hastie, Tibshirani, Friedman
The Elements of Statistical Learning
http://www.stat.ucla.edu/~ywu/research/documents/BOOKS/ElementsLearningII.pdf

Paul Eilers, Brian Marx, and Maria Durbán, 2015
Twenty years of P-splines
https://e-archivo.uc3m.es/rest/api/core/bitstreams/4e23bd9f-c90d-4598-893e-deb0a6bf0728/content

confidence_intervals(X, width=0.95, quantiles=None)[source]#

Estimate confidence intervals for the model.

Parameters:

Xarray-like of shape (n_samples, m_features): Input data matrix
widthfloat on [0,1], optional
quantilesarray-like of floats in (0, 1), optional: Instead of specifying the prediction width, one can specify the quantiles. So width=.95 is equivalent to quantiles=[.025, .975]

Returns:

intervals: np.array of shape (n_samples, 2 or len(quantiles))

Notes

Wood 2006, section 4.9: Confidence intervals based on section 4.8 rely on large sample results to deal with non-Gaussian distributions, and treat the smoothing parameters as fixed, when in reality they are estimated from the data.

deviance_residuals(X, y, weights=None, scaled=False)[source]#

Method to compute the deviance residuals of the model.

these are analogous to the residuals of an OLS.

Parameters:

Xarray-like: Input data array of shape (n_samples, m_features)
yarray-like: Output data vector of shape (n_samples, )
weightsarray-like shape (n_samples, ) or None, optional: Sample weights. if None, defaults to array of ones
scaledbool, optional: whether to scale the deviance by the (estimated) distribution scale

Returns:

deviance_residualsnp.array: with shape (n_samples, )

fit(X, y, exposure=None, weights=None)[source]#

Fit the generalized additive model.

Parameters:

Xarray-like, shape (n_samples, m_features): Training vectors, where n_samples is the number of samples and m_features is the number of features.
yarray-like, shape (n_samples, ): Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes.
exposurearray-like shape (n_samples, ) or None, default: None: containing exposures if None, defaults to array of ones
weightsarray-like shape (n_samples, ) or None, default: None: containing sample weights if None, defaults to array of ones

Returns:

selfobject: Returns fitted GAM object

generate_X_grid(term, n=100, meshgrid=False)[source]#

Create a nice grid of X data.

array is sorted by feature and uniformly spaced, so the marginal and joint distributions are likely wrong

if term is >= 0, we generate n samples per feature, which results in n^deg samples, where deg is the degree of the interaction of the term

Parameters:

termint,: Which term to process.
nint, optional: number of data points to create
meshgridbool, optional: Whether to return a meshgrid (useful for 3d plotting) or a feature matrix (useful for inference like partial predictions)

Returns:

if meshgrid is False:

np.array of shape (n, n_features) where m is the number of (sub)terms in the requested (tensor)term.

else:

tuple of len m, where m is the number of (sub)terms in the requested (tensor)term.

each element in the tuple contains a np.ndarray of size (n)^m

Raises:

ValueError: If the term requested is an intercept since it does not make sense to process the intercept term.

get_params(deep=False)[source]#

Returns a dict of all of the object’s user-facing parameters.

Parameters:

deepboolean, default: False: when True, also gets non-user-facing parameters

Returns:

dict

gridsearch(X, y, exposure=None, weights=None, return_scores=False, keep_best=True, objective='auto', **param_grids)[source]#

Performs a grid search over a space of parameters for a given objective.

NOTE: gridsearch method is lazy and will not remove useless combinations from the search space, e.g.

>> n_splines=np.arange(5,10), fit_splines=[True, False]

will result in 10 loops, of which 5 are equivalent because even though fit_splines==False

it is not recommended to search over a grid that alternates between known scales and unknown scales, as the scores of the candidate models will not be comparable.

Parameters:

Xarray

input data of shape (n_samples, m_features)

yarray

label data of shape (n_samples, )

exposurearray-like shape (n_samples, ) or None, default: None

containing exposures if None, defaults to array of ones

weightsarray-like shape (n_samples, ) or None, default: None

containing sample weights if None, defaults to array of ones

return_scoresboolean, default False

whether to return the hyperparameters and score for each element in the grid

keep_bestboolean

whether to keep the best GAM as self. default: True

objectivestring, default: ‘auto’

metric to optimize. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.

**kwargsdict, default {‘lam’: np.logspace(-3, 3, 11)}

pairs of parameters and iterables of floats, or parameters and iterables of iterables of floats.

if iterable of iterables of floats, the outer iterable must have length m_features.

the method will make a grid of all the combinations of the parameters and fit a GAM to each combination.

Returns:

if return_values == True:

model_scoresdict: Contains each fitted model as keys and corresponding objective scores as values

else:

self, ie possibly the newly fitted model

loglikelihood(X, y, exposure=None, weights=None)[source]#

Compute the log-likelihood of the dataset using the current model.

Parameters:

Xarray-like of shape (n_samples, m_features): containing the input dataset
yarray-like of shape (n, ): containing target values
exposurearray-like shape (n_samples, ) or None, default: None: containing exposures if None, defaults to array of ones
weightsarray-like of shape (n, ): containing sample weights

Returns:

log-likelihoodnp.array of shape (n, ): containing log-likelihood scores

partial_dependence(term, X=None, width=None, quantiles=None, meshgrid=False)[source]#

Computes the term functions for the GAM and possibly their confidence intervals.

if both width=None and quantiles=None, then no confidence intervals are computed

Parameters:

termint, optional

Term for which to compute the partial dependence functions.

Xarray-like with input data, optional

if meshgrid=False, then X should be an array-like of shape (n_samples, m_features).

if meshgrid=True, then X should be a tuple containing an array for each feature in the term.

if None, an equally spaced grid of points is generated.

widthfloat on (0, 1), optional

Width of the confidence interval.

quantilesarray-like of floats on (0, 1), optional

instead of specifying the prediction width, one can specify the quantiles. so width=.95 is equivalent to quantiles=[.025, .975]. if None, defaults to width.

meshgridbool, whether to return and accept meshgrids.

Useful for creating outputs that are suitable for 3D plotting.

Note, for simple terms with no interactions, the output of this function will be the same for meshgrid=True and meshgrid=False, but the inputs will need to be different.

Returns:

pdepsnp.array of shape (n_samples, )
conf_intervalslist of length len(term): containing np.arrays of shape (n_samples, 2 or len(quantiles))

Raises:

ValueError: If the term requested is an intercept since it does not make sense to process the intercept term.

See also

generate_X_grid: for help creating meshgrids.

predict(X, exposure=None)[source]#

Predict expected value of target given model and input X often this is done via expected value of GAM given input X.

Parameters:

Xarray-like of shape (n_samples, m_features), default: None: containing the input dataset
exposurearray-like shape (n_samples, ) or None, default: None: containing exposures if None, defaults to array of ones

Returns:

ynp.array of shape (n_samples, ): containing predicted values under the model

predict_mu(X)[source]#

Predict expected value of target given model and input X

Parameters:

Xarray-like of shape (n_samples, m_features),: containing the input dataset

Returns:

ynp.array of shape (n_samples, ): containing expected values under the model

sample(X, y, quantity='y', sample_at_X=None, weights=None, n_draws=100, n_bootstraps=5, objective='auto')[source]#

Simulate from the posterior of the coefficients and smoothing params.

Samples are drawn from the posterior of the coefficients and smoothing parameters given the response in an approximate way. The GAM must already be fitted before calling this method; if the model has not been fitted, then an exception is raised. Moreover, it is recommended that the model and its hyperparameters be chosen with gridsearch (with the parameter keep_best=True) before calling sample, so that the result of that gridsearch can be used to generate useful response data and so that the model’s coefficients (and their covariance matrix) can be used as the first bootstrap sample.

These samples are drawn as follows. Details are in the reference below.

1. n_bootstraps many “bootstrap samples” of the response (y) are simulated by drawing random samples from the model’s distribution evaluated at the expected values (mu) for each sample in X.

2. A copy of the model is fitted to each of those bootstrap samples of the response. The result is an approximation of the distribution over the smoothing parameter lam given the response data y.

3. Samples of the coefficients are simulated from a multivariate normal using the bootstrap samples of the coefficients and their covariance matrices.

Parameters:

Xarray of shape (n_samples, m_features)

empirical input data

yarray of shape (n_samples, )

empirical response vector

quantity{‘y’, ‘coef’, ‘mu’}, default: ‘y’

What quantity to return pseudorandom samples of. If sample_at_X is not None and quantity is either ‘y’ or ‘mu’, then samples are drawn at the values of X specified in sample_at_X.

sample_at_Xarray of shape (n_samples_to_simulate, m_features) or

None, optional Input data at which to draw new samples.

Only applies for quantity equal to ‘y’ or to ‘mu’. If None, then sample_at_X is replaced by X.

weightsnp.array of shape (n_samples, )

sample weights

n_drawspositive int, optional (default=100)

The number of samples to draw from the posterior distribution of the coefficients and smoothing parameters

n_bootstrapspositive int, optional (default=5)

The number of bootstrap samples to draw from simulations of the response (from the already fitted model) to estimate the distribution of the smoothing parameters given the response data. If n_bootstraps is 1, then only the already fitted model’s smoothing parameter is used, and the distribution over the smoothing parameters is not estimated using bootstrap sampling.

objectivestring, optional (default=’auto’)

metric to optimize in grid search. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.

Returns:

draws2D array of length n_draws

Simulations of the given quantity using samples from the posterior distribution of the coefficients and smoothing parameter given the response data. Each row is a pseudorandom sample.

If quantity == ‘coef’, then the number of columns of draws is the number of coefficients (len(self.coef_)).

Otherwise, the number of columns of draws is the number of rows of sample_at_X if sample_at_X is not None or else the number of rows of X.

Notes

A gridsearch is done n_bootstraps many times, so keep n_bootstraps small. Make n_bootstraps < n_draws to take advantage of the expensive bootstrap samples of the smoothing parameters.

References

Simon N. Wood, 2006
Generalized Additive Models: an introduction with R
Section 4.9.3 (pages 198–199) and Section 5.4.2 (page 256–257).

score(X, y, weights=None)[source]#

Compute the explained deviance for a trained model for a given X data and y labels.

Parameters:

Xarray-like: Input data array of shape (n_samples, m_features)
yarray-like: Output data vector of shape (n_samples, )
weightsarray-like shape (n_samples, ) or None, optional: Sample weights. if None, defaults to array of ones

Returns:

explained deviance score: np.array() (n_samples, )

set_params(deep=False, force=False, **parameters)[source]#

Sets an object’s parameters.

Parameters:

deepboolean, default: False: when True, also sets non-user-facing parameters
forceboolean, default: False: when True, also sets parameters that the object does not already have
**parametersparameters to set

Returns:

self

summary()[source]#

Produce a summary of the model statistics.

Returns:

None

PoissonGAM#

This Page