vineknockoffs.VineKnockoffs#
- class vineknockoffs.VineKnockoffs(dvine=None, marginals=None, dvine_structure=None)#
Vine copula knockoffs.
Parameters#
- dvineNone or
DVineCopula
object The
DVineCopula
object specifies the vine copula model used to generate knockoffs. If it is set to None the vine copula knockoff model can be learned from data with the methodsfit_vine_copula_knockoffs()
,fit_gaussian_copula_knockoffs()
orfit_gaussian_knockoffs()
. Default isNone
.- marginals: list
The marginal distributions for the vine copula knockoff model. Must be a list of n_vars distributions which implement the methods
cdf()
andppf()
. If it is set to None the marginal distributions can be estimated from data with the methodsfit_marginals()
, which is also called byfit_vine_copula_knockoffs()
,fit_gaussian_copula_knockoffs()
orfit_gaussian_knockoffs()
. Default isNone
.- dvine_structure:
numpy.array
The D-vine structure (order of variables) for the vine copula knockoff model. Default is
None
.
Examples#
# ToDo: add an example here
Methods
fit_gaussian_copula_knockoffs
(x_train[, ...])Estimate a Gaussian copula knockoff model.
fit_gaussian_knockoffs
(x_train[, algo])Estimate a Gaussian knockoff model.
fit_marginals
(x_train[, model])fit_sgd
(x_train[, lr, gamma, n_batches, ...])Optimize a vine copula knockoff model parameters using a SGD algorithm.
fit_vine_copula_knockoffs
(x_train[, ...])Estimate a vine copula knockoff model.
generate
(x_test[, knockoff_eps])generate_par_jacobian
(x_test[, ...])ll
(x, x_knockoffs)Attributes
dvine_structure
inv_dvine_structure
n_pars_upper_trees
- dvineNone or
- VineKnockoffs.fit_gaussian_copula_knockoffs(x_train, marginals='kde1d', algo='sdp', vine_structure='1:n')#
Estimate a Gaussian copula knockoff model.
Parameters#
- x_train
numpy.ndarray
Array of covariates.
- marginalsstr
A str (
'kde1d'
or'kde_statsmodels'
) specifying the estimator for the marginal distributions.'kde1d'
: The univariate kernel density estimator implemented in the R package kde1d (via rpy2).'kde_statsmodels'
: The univariate version of the kernel density estimatorKDEMultivariate
implemented in the Python package statsmodels. Default is'kde1d'
.- algostr
A str (
'sdp'
or''ecorr''
) specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is''sdp''
.- vine_structurestr
A str (
'select_tsp_r'
,'select_tsp_py'
or'1:n'
) specifying how the structure of the D-vine is selected.'select_tsp_r'
: Maximize the dependence in the first tree by solving a TSP with the R package TSP.'select_tsp_py'
: Maximize the dependence in the first tree by solving a TSP with the Python package python_tsp.'1:n'
: Use the natural order of the variables X_1, X_2, …, X_d-1, X_d. Default is'select_tsp_r'
.
- x_train
- VineKnockoffs.fit_gaussian_knockoffs(x_train, algo='sdp')#
Estimate a Gaussian knockoff model.
Parameters#
- x_train
numpy.ndarray
Array of covariates.
- algostr
A str (
'sdp'
or''ecorr''
) specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is''sdp''
.
- x_train
- VineKnockoffs.fit_marginals(x_train, model='kde1d')#
- VineKnockoffs.fit_sgd(x_train, lr=0.01, gamma=0.9, n_batches=5, n_iter=20, which_par='all', loss_alpha=1.0, loss_delta_sdp_corr=1.0, loss_gamma=1.0, loss_delta_corr=0.0)#
Optimize a vine copula knockoff model parameters using a SGD algorithm.
Parameters#
- x_train
numpy.ndarray
Array of covariates.
- lrfloat
The learning rate for the SGD algorithm. Default is
0.01
.- gammafloat
SGD momentum parameter. Default is
0.9
.- n_batchesint
Number of batches in the SGD algorithm. Default is
5
.- n_iterint
Maximum number of iterations of the SGD algorithm. Default is
20
.- which_parstr
A str (
''all''
or''upper only''
) specifying whether all parameters or only the parameters of the upper trees should be optimized with the SGD algorithm. Default is''all''
.- loss_alphafloat
Parameter alpha in the SGD loss function. Default is
1.
.- loss_delta_sdp_corrfloat
Parameter delta_sdp_corr in the SGD loss function. Default is
1.
.- loss_gammafloat
Parameter gamma in the SGD loss function. Default is
1.
.- loss_delta_corrfloat
Parameter delta_corr in the SGD loss function. Default is
0.
.
- x_train
- VineKnockoffs.fit_vine_copula_knockoffs(x_train, marginals='kde1d', families='all', rotations=True, indep_test=True, vine_structure='select_tsp_r', upper_tree_cop_fam_heuristic='Gaussian', sgd=False, sgd_lr=0.01, sgd_gamma=0.9, sgd_n_batches=5, sgd_n_iter=20, sgd_which_par='all', loss_alpha=1.0, loss_delta_sdp_corr=1.0, loss_gamma=1.0, loss_delta_corr=0.0, gau_cop_algo='sdp')#
Estimate a vine copula knockoff model.
Parameters#
- x_train
numpy.ndarray
Array of covariates.
- marginalsstr
A str (
'kde1d'
or'kde_statsmodels'
) specifying the estimator for the marginal distributions.'kde1d'
: The univariate kernel density estimator implemented in the R package kde1d (via rpy2).'kde_statsmodels'
: The univariate version of the kernel density estimatorKDEMultivariate
implemented in the Python package statsmodels. Default is'kde1d'
.- familiesstr
The set of possible copula families explored via minimizing the AIC. Currently, the only implemented choice is
'all'
(i.e., Clayton, Frank, Gaussian, and Gumbel). Default is'all'
.- rotationsbool
Indicates whether rotated versions of the Clayton and Gumbel copula should be considered. Default is
True
.- indep_testbool
Indicates whether an independence test should be performed before choosing a copula family with the AIC. If independence cannot be rejected, an Independence copula is being assigned to the corresponding vine edge. Default is
True
.- vine_structurestr
A str (
'select_tsp_r'
,'select_tsp_py'
or'1:n'
) specifying how the structure of the D-vine is selected.'select_tsp_r'
: Maximize the dependence in the first tree by solving a TSP with the R package TSP.'select_tsp_py'
: Maximize the dependence in the first tree by solving a TSP with the Python package python_tsp.'1:n'
: Use the natural order of the variables X_1, X_2, …, X_d-1, X_d. Default is'select_tsp_r'
.- upper_tree_cop_fam_heuristicstr
A str (
'Gaussian'
or'lower tree families'
) specifying the heuristic used to select the copula families in the higher trees (from the (d+1)-th tree on).'Gaussian'
: Gaussian copulas corresponding to the partial correlation vine are assigned to the edges in the higher trees.'lower tree families'
: The copula families from the lower trees are mirrored to the higher trees. Default is'Gaussian'
.- sgdbool
Indicates whether the parameters of the vine copula knockoff model should be tuned using a stochastic gradient descent algorithm. Default is
False
.- sgd_lrfloat
The learning rate for the SGD algorithm. Default is
0.01
.- sgd_gammafloat
SGD momentum parameter. Default is
0.9
.- sgd_n_batchesint
Number of batches in the SGD algorithm. Default is
5
.- sgd_n_iterint
Maximum number of iterations of the SGD algorithm. Default is
20
.- sgd_which_parstr
A str (
''all''
or''upper only''
) specifying whether all parameters or only the parameters of the upper trees should be optimized with the SGD algorithm. Default is''all''
.- loss_alphafloat
Parameter alpha in the SGD loss function. Default is
1.
.- loss_delta_sdp_corrfloat
Parameter delta_sdp_corr in the SGD loss function. Default is
1.
.- loss_gammafloat
Parameter gamma in the SGD loss function. Default is
1.
.- loss_delta_corrfloat
Parameter delta_corr in the SGD loss function. Default is
0.
.- gau_cop_algostr
A str (
'sdp'
or''ecorr''
) specifying the algorithm used (semidefinite program or equicorrelation) to obtain the parameters (the partial correlation vine) of the Gaussian copula knockoffs. Default is''sdp''
.
- x_train
- VineKnockoffs.generate(x_test, knockoff_eps=None)#
- VineKnockoffs.generate_par_jacobian(x_test, knockoff_eps=None, which_par='upper only', return_x_knockoffs=False)#
- VineKnockoffs.ll(x, x_knockoffs)#