Getting started with highMLR

Overview

highMLR provides a single, unified interface for high-dimensional feature selection when the outcome is a (possibly censored) survival time. The same highmlr() call dispatches to one of several machine learning methods:

  • "coxnet" – Cox elastic net (glmnet)
  • "rsf" – random survival forest (ranger)
  • "aorsf" – accelerated oblique random survival forest (aorsf)
  • "xgboost" – gradient-boosted Cox (xgboost)
  • "stability" – stability selection (stabs)
  • "univariate" – classical univariate Cox screening
  • "pseudo" – pseudo-observation bridge to an arbitrary regression learner
  • "finegray" – Fine-Gray competing-risks selection

All methods return a highmlr_fit object with a common structure, so the downstream verbs (print(), summary(), plot(), coef(), predict()) and the companion functions (highmlr_compare(), highmlr_stability(), highmlr_explain(), highmlr_screen(), highmlr_report()) work identically regardless of which method produced the fit.

A first fit

The package ships with two bundled high-dimensional survival datasets, hnscc and srdata. Both use OS for the survival time; the event indicator is Death in hnscc and event in srdata (1 = event, 0 = censored).

library(highMLR)
data(hnscc)

fit <- highmlr(
  hnscc,
  time   = "OS",
  status = "Death",
  method = "coxnet",
  resampling = "cv",
  folds = 5
)

print(fit)
plot(fit, top_n = 20)

The examples in this vignette are not evaluated at build time because the underlying learners (glmnet, ranger, aorsf, xgboost, grf, survex) can be slow on high-dimensional data. Copy the chunks into an interactive session to run them.

Comparing methods

highmlr_compare() runs several methods on the same data and returns a tidy side-by-side summary:

cmp <- highmlr_compare(
  hnscc, "OS", "Death",
  methods = c("coxnet", "rsf", "univariate")
)
cmp$summary

Pre-screening when p is very large

For very wide data, reduce the candidate set first:

data(srdata)
keep <- highmlr_screen(srdata, "OS", "event",
                       filter = "variance", keep = 500)
fit  <- highmlr(srdata, "OS", "event",
                features = keep, method = "coxnet")

Explaining a fit

Time-dependent SHAP values (SurvSHAP(t)) are available via highmlr_explain(), and a one-file biomarker report can be generated with highmlr_report().

ex <- highmlr_explain(fit, new_data = hnscc, method = "survshap")
print(ex)
plot(ex)

Session information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.59       
#>  [5] maketools_1.3.2  cachem_1.1.0     knitr_1.51       htmltools_0.5.9 
#>  [9] buildtools_1.0.0 lifecycle_1.0.5  cli_3.6.6        sass_0.4.10     
#> [13] jquerylib_0.1.4  compiler_4.6.0   sys_3.4.3        tools_4.6.0     
#> [17] evaluate_1.0.5   bslib_0.11.0     yaml_2.3.12      otel_0.2.0      
#> [21] jsonlite_2.0.0   rlang_1.2.0