vineknockoffs.VineKnockoffs#

class vineknockoffs.VineKnockoffs(dvine=None, marginals=None, dvine_structure=None)#

Vine copula knockoffs.

Parameters#

dvineNone or DVineCopula object

The DVineCopula object specifies the vine copula model used to generate knockoffs. If it is set to None the vine copula knockoff model can be learned from data with the methods fit_vine_copula_knockoffs(), fit_gaussian_copula_knockoffs() or fit_gaussian_knockoffs(). Default is None.

marginals: list

The marginal distributions for the vine copula knockoff model. Must be a list of n_vars distributions which implement the methods cdf() and ppf(). If it is set to None the marginal distributions can be estimated from data with the methods fit_marginals(), which is also called by fit_vine_copula_knockoffs(), fit_gaussian_copula_knockoffs() or fit_gaussian_knockoffs(). Default is None.

dvine_structure: numpy.array

The D-vine structure (order of variables) for the vine copula knockoff model. Default is None.

Examples#

# ToDo: add an example here

Methods

fit_gaussian_copula_knockoffs(x_train[, ...])

Estimate a Gaussian copula knockoff model.

fit_gaussian_knockoffs(x_train[, algo])

Estimate a Gaussian knockoff model.

fit_marginals(x_train[, model])

fit_sgd(x_train[, lr, gamma, n_batches, ...])

Optimize a vine copula knockoff model parameters using a SGD algorithm.

fit_vine_copula_knockoffs(x_train[, ...])

Estimate a vine copula knockoff model.

generate(x_test[, knockoff_eps])

generate_par_jacobian(x_test[, ...])

ll(x, x_knockoffs)

Attributes

dvine_structure

inv_dvine_structure

n_pars_upper_trees

VineKnockoffs.fit_gaussian_copula_knockoffs(x_train, marginals='kde1d', algo='sdp', vine_structure='1:n')#

Estimate a Gaussian copula knockoff model.

Parameters#

x_trainnumpy.ndarray

Array of covariates.

marginalsstr

A str ('kde1d' or 'kde_statsmodels') specifying the estimator for the marginal distributions. 'kde1d': The univariate kernel density estimator implemented in the R package kde1d (via rpy2). 'kde_statsmodels': The univariate version of the kernel density estimator KDEMultivariate implemented in the Python package statsmodels. Default is 'kde1d'.

algostr

A str ('sdp' or ''ecorr'') specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is ''sdp''.

vine_structurestr

A str ('select_tsp_r', 'select_tsp_py' or '1:n') specifying how the structure of the D-vine is selected. 'select_tsp_r': Maximize the dependence in the first tree by solving a TSP with the R package TSP. 'select_tsp_py': Maximize the dependence in the first tree by solving a TSP with the Python package python_tsp. '1:n': Use the natural order of the variables X_1, X_2, …, X_d-1, X_d. Default is 'select_tsp_r'.

VineKnockoffs.fit_gaussian_knockoffs(x_train, algo='sdp')#

Estimate a Gaussian knockoff model.

Parameters#

x_trainnumpy.ndarray

Array of covariates.

algostr

A str ('sdp' or ''ecorr'') specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is ''sdp''.

VineKnockoffs.fit_marginals(x_train, model='kde1d')#
VineKnockoffs.fit_sgd(x_train, lr=0.01, gamma=0.9, n_batches=5, n_iter=20, which_par='all', loss_alpha=1.0, loss_delta_sdp_corr=1.0, loss_gamma=1.0, loss_delta_corr=0.0)#

Optimize a vine copula knockoff model parameters using a SGD algorithm.

Parameters#

x_trainnumpy.ndarray

Array of covariates.

lrfloat

The learning rate for the SGD algorithm. Default is 0.01.

gammafloat

SGD momentum parameter. Default is 0.9.

n_batchesint

Number of batches in the SGD algorithm. Default is 5.

n_iterint

Maximum number of iterations of the SGD algorithm. Default is 20.

which_parstr

A str (''all'' or ''upper only'') specifying whether all parameters or only the parameters of the upper trees should be optimized with the SGD algorithm. Default is ''all''.

loss_alphafloat

Parameter alpha in the SGD loss function. Default is 1..

loss_delta_sdp_corrfloat

Parameter delta_sdp_corr in the SGD loss function. Default is 1..

loss_gammafloat

Parameter gamma in the SGD loss function. Default is 1..

loss_delta_corrfloat

Parameter delta_corr in the SGD loss function. Default is 0..

VineKnockoffs.fit_vine_copula_knockoffs(x_train, marginals='kde1d', families='all', rotations=True, indep_test=True, vine_structure='select_tsp_r', upper_tree_cop_fam_heuristic='Gaussian', sgd=False, sgd_lr=0.01, sgd_gamma=0.9, sgd_n_batches=5, sgd_n_iter=20, sgd_which_par='all', loss_alpha=1.0, loss_delta_sdp_corr=1.0, loss_gamma=1.0, loss_delta_corr=0.0, gau_cop_algo='sdp')#

Estimate a vine copula knockoff model.

Parameters#

x_trainnumpy.ndarray

Array of covariates.

marginalsstr

A str ('kde1d' or 'kde_statsmodels') specifying the estimator for the marginal distributions. 'kde1d': The univariate kernel density estimator implemented in the R package kde1d (via rpy2). 'kde_statsmodels': The univariate version of the kernel density estimator KDEMultivariate implemented in the Python package statsmodels. Default is 'kde1d'.

familiesstr

The set of possible copula families explored via minimizing the AIC. Currently, the only implemented choice is 'all' (i.e., Clayton, Frank, Gaussian, and Gumbel). Default is 'all'.

rotationsbool

Indicates whether rotated versions of the Clayton and Gumbel copula should be considered. Default is True.

indep_testbool

Indicates whether an independence test should be performed before choosing a copula family with the AIC. If independence cannot be rejected, an Independence copula is being assigned to the corresponding vine edge. Default is True.

vine_structurestr

A str ('select_tsp_r', 'select_tsp_py' or '1:n') specifying how the structure of the D-vine is selected. 'select_tsp_r': Maximize the dependence in the first tree by solving a TSP with the R package TSP. 'select_tsp_py': Maximize the dependence in the first tree by solving a TSP with the Python package python_tsp. '1:n': Use the natural order of the variables X_1, X_2, …, X_d-1, X_d. Default is 'select_tsp_r'.

upper_tree_cop_fam_heuristicstr

A str ('Gaussian' or 'lower tree families') specifying the heuristic used to select the copula families in the higher trees (from the (d+1)-th tree on). 'Gaussian': Gaussian copulas corresponding to the partial correlation vine are assigned to the edges in the higher trees. 'lower tree families': The copula families from the lower trees are mirrored to the higher trees. Default is 'Gaussian'.

sgdbool

Indicates whether the parameters of the vine copula knockoff model should be tuned using a stochastic gradient descent algorithm. Default is False.

sgd_lrfloat

The learning rate for the SGD algorithm. Default is 0.01.

sgd_gammafloat

SGD momentum parameter. Default is 0.9.

sgd_n_batchesint

Number of batches in the SGD algorithm. Default is 5.

sgd_n_iterint

Maximum number of iterations of the SGD algorithm. Default is 20.

sgd_which_parstr

A str (''all'' or ''upper only'') specifying whether all parameters or only the parameters of the upper trees should be optimized with the SGD algorithm. Default is ''all''.

loss_alphafloat

Parameter alpha in the SGD loss function. Default is 1..

loss_delta_sdp_corrfloat

Parameter delta_sdp_corr in the SGD loss function. Default is 1..

loss_gammafloat

Parameter gamma in the SGD loss function. Default is 1..

loss_delta_corrfloat

Parameter delta_corr in the SGD loss function. Default is 0..

gau_cop_algostr

A str ('sdp' or ''ecorr'') specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is ''sdp''.

VineKnockoffs.generate(x_test, knockoff_eps=None)#
VineKnockoffs.generate_par_jacobian(x_test, knockoff_eps=None, which_par='upper only', return_x_knockoffs=False)#
VineKnockoffs.ll(x, x_knockoffs)#