Parallelization

The shar packages has no build-in parallelization, however, there are many R frameworks that allow to run code in parallel or on high-performance clusters (see e.g., future, clustermq or rslurm). Thus, shar provides utility functions that facilitate its usage together with any parallelization framework.

The following examples illustrates how to use future package to randomize patterns using fit_point_process in parallel using all available cores on a local machine. Similarly, the core idea of the following code could be used to run shar on a high performance cluster.

First, we need to load all required packages. This includes future and future.apply.

library(shar)
library(spatstat)
library(terra)

library(future)
library(future.apply)

The future packages allows to run code in parallel using only a few lines of code. By setting the future plan to multisession, the package automatically resolves all following futures in parallel.

Importantly with this approach, you need one randomization per core (n_random = 1) and set simplify = TRUE to return the point pattern only. This results in a list of randomized point patterns.

future::plan(multisession)

fitted_list <- future.apply::future_lapply(X = 1:39, FUN = function(i) {
   shar::fit_point_process(pattern = species_b, n_random = 1, 
                           return_input = FALSE, simplify = TRUE, verbose = FALSE)
}, future.seed = 42)

Next, you can use the list_to_randomized() function to convert this list of randomized pattern to a rd_pat object that will work will all other functions of the shar package.

fitted_rd <- list_to_randomized(list = fitted_list, observed = shar::species_b)

Lastly, the created objects can be used to analyse if species-habitat associations are present as usual.

landscape_classified <- classify_habitats(raster = terra::rast(landscape), n = 5, style = "fisher")

results_habitat_association(pattern = fitted_rd, raster = landscape_classified)
#> > Input: randomized pattern
#> > Quantile thresholds: negative < 0.025 || positive > 0.975
#>   habitat breaks count    lo    hi significance
#> 1       1     NA     6 14.90 29.15     negative
#> 2       2     NA    18 26.95 49.05     negative
#> 3       3     NA    18 21.65 43.15     negative
#> 4       4     NA    21 27.00 52.15     negative
#> 5       5     NA   129 54.95 74.10     positive

Of course, this idea can be used to randomize the raster data as well. Furthermore, any other parallelization framework could be used.

Maximilian H.K. Hesselbarth

2025-02-13