Helper function for PPI++ OLS estimation (point estimate)
Usage
ppi_plusplus_ols_est(
X_l,
Y_l,
f_l,
X_u,
f_u,
lhat = NULL,
coord = NULL,
w_l = NULL,
w_u = NULL
)
Arguments
- X_l
(matrix): n x p matrix of covariates in the labeled data.
- Y_l
(vector): n-vector of labeled outcomes.
- f_l
(vector): n-vector of predictions in the labeled data.
- X_u
(matrix): N x p matrix of covariates in the unlabeled data.
- f_u
(vector): N-vector of predictions in the unlabeled data.
- lhat
(float, optional): Power-tuning parameter (see https://arxiv.org/abs/2311.01453). The default value,
NULL
, will estimate the optimal value from the data. Settinglhat = 1
recovers PPI with no power tuning, and settinglhat = 0
recovers the classical point estimate.- coord
(int, optional): Coordinate for which to optimize
lhat = 1
. IfNULL
, it optimizes the total variance over all coordinates. Must be in (1, ..., d) where d is the dimension of the estimand.- w_l
(ndarray, optional): Sample weights for the labeled data set. Defaults to a vector of ones.
- w_u
(ndarray, optional): Sample weights for the unlabeled data set. Defaults to a vector of ones.
Details
PPI++: Efficient Prediction Powered Inference (Angelopoulos et al., 2023) https://arxiv.org/abs/2311.01453
Examples
dat <- simdat(model = "ols")
form <- Y - f ~ X1
X_l <- model.matrix(form, data = dat[dat$set == "labeled",])
Y_l <- dat[dat$set == "labeled", all.vars(form)[1]] |> matrix(ncol = 1)
f_l <- dat[dat$set == "labeled", all.vars(form)[2]] |> matrix(ncol = 1)
X_u <- model.matrix(form, data = dat[dat$set == "unlabeled",])
f_u <- dat[dat$set == "unlabeled", all.vars(form)[2]] |> matrix(ncol = 1)
ppi_plusplus_ols_est(X_l, Y_l, f_l, X_u, f_u)
#> X(Intercept) XX1
#> 0.7804814 1.0524179