Performing Calibration for LeafCares – SoilCares

What is ‘calibration’

Calibration is the process of collecting samples both to define and broaden the applicability of a model. Calibration can be either incorporating an initial large set of samples into the training dataset or expanding the training dataset with a small set of additional samples.

In either case, the samples are collected, scanned, and analysed by our laboratory. The paired spectra and lab reference values for each sample are then cleaned of outliers and the model is (re)trained to incorporate the information present in the new calibration set.

When is calibration needed?

The need for calibration is assessed on a case-by-case basis and may arise from unsatisfying model performance in specialized situations, regular updates to reflect changing crop conditions, or deployment under conditions for which the model was not originally designed.

For LeafCares, all three situations can be possible. Calibration of a new species of plant requires a new model, calibration of variants are required when an existing model is needed to generalize to different varieties not adequately represented in the training set, and regular calibration is necessary to model annual changes in local crop conditions

Let’s use the Apple & Pear model as an example: the first version of the model was originally designed to handle samples from both species, but its training set consisted exclusively of Dutch samples. As such, the model could successfully be used for both species, but it was only calibrated for Dutch samples.

Subsequently, a decline in performance was observed when predicting samples from foreign regions. We could have developed a region-specific or variety-specific model, but instead we extended the calibration set for the model and observed improvement for the foreign region without losing Dutch performance.

How to select calibration samples

In the context of calibration, sample selection is critical because it directly affects the model’s ability to interpret unknown varieties. The key concept here is representativeness, since calibration samples must accurately reflect the variety or crop they come from and capture as much variability as possible within that population.

This variability should include not only phenotypical traits (those related to the visible characteristics of the plant) but also physiological differences, as it is essential to incorporate examples of plants under deficient, adequate, and high-nutrient conditions, ensuring that the model can clearly recognize all possible scenarios.

Since the environment is one of the main factors influencing plant traits, new variability tends to emerge quickly over time. Therefore, incorporating new calibration samples into the training dataset is considered part of standard model maintenance procedures, alongside the routine addition of samples. In this context, annual sampling proves highly beneficial, as it effectively captures previously unseen variability.

Effects of calibration

An experiment using the Apple & Pear model showed that even with a very small additional calibration dataset (only 7 samples per country), it is possible to significantly improve prediction accuracy for foreign samples. This improvement is illustrated in the following plot which reports the reduction in error caused by adding local samples for each country, using N, P and K as an example. The error is expressed as wMAPE (Weighted Mean Absolute Percentage Error), a commonly used metric representing the mean prediction error of a model.

Fig. 1. Decrease in error due to calibration

How many samples are needed to create a good model?

Determining the exact number of samples needed to build a model for a new crop is challenging, as it depends on multiple factors and cannot be answered without an empirical analysis of how different training set sizes impact model performance. However, based on our experience, this number is typically around 500 samples. In fact, while some crops may require more effort, a dataset of this size generally results in a significant reduction in error and consequently better predictions.

The plot below illustrates the effect of how training set size affects model error. Both clearly show reduced error, but the baselines have a large impact on overall performance.

Related to

Related articles