This tutorial demonstrates how to fine-tune the step size parameter in StarTrail. We note that this is a heuristic approach, as tuning the step size parameter is inherently challenging due to the fact that we could not observe spatial gradients.
Recommended Step Size
For regular grid data, we recommend using the minimal separation ι or 0.8 × ι, as used in all our analyses. However, if your spatial coordinates are irregular (e.g., having different densities in different regions), we present below a simulation-based approach to help you select a potentially better step size for your data.
Calculate Minimal Separation
First, calculate the minimal separation between spots in your spatial coordinates:
coord_dist = cdist_r(coords,coords)
coord_dist_temp = coord_dist
diag(coord_dist_temp)=10000
min_sep = min(coord_dist_temp)
print(min_sep)
Generate Synthetic Pattern
Next, we generate a synthetic spatial pattern to test different step sizes. This allows us to evaluate performance against a known ground truth:
tau = 1
# Synthetic pattern
y <- rnorm(nrow(coords), 10*(sin(3*pi*coords[,1])+cos(3*pi*coords[,2])), tau)
thread = 10
m.r = fit_NNGP(coords, y, neighbor = 10, threads = thread) # here we use 10 neighbors
Evaluate Different Step Sizes
Now we test different step size scales and evaluate their performance using correlation and mean squared error (MSE) metrics:
path = './result/'
set.seed(1)
result = data.table(scale = c(0.01, 0.1, 0.5, 0.8, 1, 5, 10,20,100))
result$cor_g1 = NA; result$cor_g2 = NA; result$mse_g1 = NA; result$mse_g2 = NA
for(i in 1:nrow(result)){
scale = result$scale[i]
gradient_all = finite_difference(coords, min_sep*scale, m.r, threads=thread,
prefix = paste0('scale', scale), path=path)
gradient_all = cbind(coords, y, gradient_all)
colnames(gradient_all) = c('s1', 's2', 'y', 'pred', 'g1', 'g2',
'g1_min', 'g1_max', 'g2_min', 'g2_max')
result$cor_g1[i] = cor(gradient_all$g1, 30*cos(3*pi*gradient_all$s1))
result$cor_g2[i] = cor(gradient_all$g2, -30*sin(3*pi*gradient_all$s2))
result$mse_g1[i] = mean((gradient_all$g1 - 30*cos(3*pi*gradient_all$s1))^2)
result$mse_g2[i] = mean((gradient_all$g2 + 30*sin(3*pi*gradient_all$s2))^2)
}
The result table contains the correlation and MSE for each step size scale. Choose the scale that provides the best balance between correlation (higher is better) and MSE (lower is better) for your specific dataset.