Data generation function for various underlying models
Usage
simdat(
n = c(300, 300, 300),
effect = 1,
sigma_Y = 1,
model = "ols",
shift = 0,
scale = 1
)
Arguments
- n
Integer vector of size 3 indicating the sample sizes in the training, labeled, and unlabeled data sets, respectively
- effect
Regression coefficient for the first variable of interest for inference. Defaults is 1.
- sigma_Y
Residual variance for the generated outcome. Defaults is 1.
- model
The type of model to be generated. Must be one of
"mean"
,"quantile"
,"ols"
, or"logistic"
. Default is"ols"
.- shift
Scalar shift of the predictions for continuous outcomes (i.e., "mean", "quantile", and "ols"). Defaults to 0.
- scale
Scaling factor for the predictions for continuous outcomes (i.e., "mean", "quantile", and "ols"). Defaults to 1.
Value
A data.frame containing n rows and columns corresponding to the labeled outcome (Y), the predicted outcome (f), a character variable (set) indicating which data set the observation belongs to (training, labeled, or unlabeled), and four independent, normally distributed predictors (X1, X2, X3, and X4), where applicable.
Examples
#-- Mean
dat_mean <- simdat(c(100, 100, 100), effect = 1, sigma_Y = 1,
model = "mean")
head(dat_mean)
#> Y f set
#> 1 1.2358347 NA training
#> 2 0.3104177 NA training
#> 3 2.2010616 NA training
#> 4 1.5780407 NA training
#> 5 2.2034014 NA training
#> 6 0.3235828 NA training
#-- Linear Regression
dat_ols <- simdat(c(100, 100, 100), effect = 1, sigma_Y = 1,
model = "ols")
head(dat_ols)
#> X1 X2 X3 X4 Y f set
#> 1 -0.01987176 -0.1397668 1.13837121 -0.318232616 1.2596350 NA training
#> 2 1.38072091 0.6508478 -1.36065237 -0.083392001 0.7739356 NA training
#> 3 -0.16047478 0.8209136 -0.45652812 -0.027092071 0.6575092 NA training
#> 4 1.29612624 -1.1946885 -0.28393557 -0.989618989 2.5554902 NA training
#> 5 -0.86328948 -0.7039109 -0.01353596 -0.639295088 -1.3541555 NA training
#> 6 0.78215060 -0.1660609 -0.26126609 0.005894084 1.7686558 NA training