Nearest Neighbor Model Types

As with most predictive models, alternative model forms can be specified to optimize for different objectives and outcomes. In applying nearest-neighbor imputation for developing vegetation maps, many 'moving parts' can be tweaked to optimize the spatial predictions (maps) for different vegetation attributes (see review by Eskelson et al. 2009)[1]. Most of our map products have been developed using gradient nearest neighbor (GNN) imputation, which differs from other nearest-neighbor methods primarily by its distance metric: Euclidean distance in 8-dimensional gradient space defined by direct gradient analysis, or constrained ordination (specifically canonical correspondence analysis), with axes weighted by their eigenvalues (Ohmann and Gregory 2002). Most of our downloadable maps were developed using a single nearest-neighbor plot (k=1), although some of our model diagnostics are based on k>1.

Properties of (G)NN maps also can vary depending on specification of the spatial predictors and response variables (vegetation attributes on plots). For some of our research projects, we have developed multiple GNN models that illustrate variations in spatial patterning and emphasis on species composition vs. forest structure. We expect each kind of model to have advantages for certain applications.

Species composition vs. forest structure in GNN models

By varying the explanatory and response variables used in CCA, the GNN models can be optimized along a continuum from species composition to forest structure.

Species Models

For map applications where the primary interest is species composition (distributions of individual species and plant community types), we have learned that we achieve better GNN map accuracy by excluding satellite imagery, disturbance, and land ownership variables from the models. This is because the spectral data (usually from Landsat) are more strongly correlated with forest structure than with species composition. In the GNN "species models," response variables used in model development are species abundances, which can be basal area or cover of tree species, or cover of understory species (shrubs and herbs). Stand structure variables are not attached to resulting GNN grids. The GNN species models can utilize any plots having data on cover by species, usually a much larger sample size than plots where forest structure data (tree tally) are available.

Species-size Models

The GNN "species-size models" contain information on both (tree) species composition and forest structure. Response variables used in model development are basal area by species and size-class (but not all size-classes are recognized for all species). Explanatory variables include Landsat-derived variables, and can include measures of disturbance history and land ownerships as well. The species-size models are a compromise between species and structure, and have become our primary GNN map product for most projects. We expect them to serve a variety of user applications, particularly where elements of both species composition and stand structure are needed and covariance among these elements must be maintained (e.g., if tree lists are to be input into simulation models such as the Forest Vegetation Simulator). Accuracy for species variables in this model generally are intermediate between the species and structure models, and accuracy for structure variables is comparable to or slightly worse than the structure models.

Structure Models

The GNN "structure models" place even greater emphasis on forest structure rather than species composition. Response variables are basal area by species group (conifer or hardwood) and size-class, total canopy cover, snag density by size class, and total down wood volume. These models provide slightly better overall accuracy of structure and fuels variables compared to the species-size model, but less accurate depiction of species distributions compared to the species and species-size models. The structure models were developed primarily for our GNNFire project, where our primary emphasis was on mapping fuels, and are not a standard GNN map product.

Spatial patterning of GNN maps

The appearance (spatial patterning, or "look and feel") of GNN maps is strongly influenced by the spatial resolution of the independent variables -- particularly those derived from Landsat TM imagery. We have learned that median-filtering of the raw Landsat imagery has the effect of reducing the fine-scale heterogeneity, or salt-and-peppering, in the final map, while maintaining boundaries between contrasting vegetation conditions (e.g., of clearcuts or stand-replacing fires). The median filtering consists of moving a nine-pixel window across the image, and assigning the median value of nine pixels to the center pixel. Grids for individual bands, ratios, and transformations are filtered independently. In general, overall accuracy in resulting GNN predictions appears to be little affected by the filtering, so decisions on which model to use are largely subjective based on appearance.

A similar reduction in fine-scale heterogeneity, or "salt-and-peppering," in imputed maps is achieved by using fewer Landsat variables. For example, use of three tasseled cap bands rather than six individual bands yields maps with less fine-scale heterogeneity. Most of our recent GNN models have used three tasseled cap bands.

Other spatial modeling techniques

LEMMA applies spatial modeling methods in addition to nearest-neighbor imputation in our research. The maps of Ecological Systems for the GAP project, mapzones 2 and 7, were developed using random forest, a machine learning method (Breiman 2001) [2]. The random forest method was chosen for this project since we optimized to get the best accuracy for a single attribute (land cover) rather than trying to map a full suite of attributes.

The GNN models from the GNNFire project were compared to linear models and CART by Pierce et al. 2009.

[1] Eskelson, B.N.I., H. Temesgen, V. LeMay, T.M. Barrett, N.L. Crookston, and A.T. Hudak. 2009. The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases. Scandinavian Journal of Forest Research 24(3):235-246.

[2] Breiman, L. 2001. Random Forests. Machine Learning 45(1):5-32.