# Clusters

The pros & cons of different analysis techniques are discussed in detail in the Review Of Home Range Analyses and for a more comprehensive review, see "A Manual for Wildlife Radio Tagging" (Kenward 2001) and Kenward et al. (2001).

Peripheral convex polygons have been used for range analysis since the start of radio tagging. Their size is strongly influenced by outlying locations and can include large areas not visited by animals. One way of addressing this was to estimate probabilities of encounter within outlines based on density of locations, first as ellipses and then as contours. However, these too are outlier-sensitive and their parametric smoothing is prone to expand into unvisited areas. Another approach was to attribute a grid cell, with dimensions based on tracking accuracy, to locations at each point. The problem with this approach is a tendency to underestimate the areas visited unless there are very large numbers of locations available, or "joining rules" are used to link grid cells. However, joining rules that link neighbouring locations can also be used to estimate polygons that define visited areas with great accuracy, estimating range cores as polygons to which isolated grid cells at outlying locations contribute little area.

An early neighbour-linkage method involved restricting polygon edges plotted between locations to a fraction (initially half) of the span of maximum distance between any locations, which gave "concave polygons". However, the span remains strongly influenced by outliers and the choice of fraction is an arbitrary decision. A subsequent method defined clusters of locations, by minimising sums of nearest neighbour distances as locations with longer distances are added. The analysis starts the first cluster by identifying the two locations that are closest together and have the nearest 3rd location (i.e. the minimal sum of linkage distances). It then finds the location nearest to one in this initial cluster. If this is less than the distance to the 3rd location in any other potential cluster, the 4th location joins the original cluster. If not, a new cluster forms. If two clusters have nearest neighbours at equal distances, the location that joins is the one that minimises the distance to all locations in the cluster (i.e. a centroid rule resolves ties). If the nearest neighbour is already assigned to another cluster, the two clusters join. When the required percentage of locations has been assigned, a polygon (which was initially convex but can also be concave) is drawn around each cluster and their areas summed (Kenward 1987), as implemented in Ranges 4 in 1990.

Four additions to the original Cluster implementation were added to Ranges 6 in 2003: Objective cores, concave polygons, alternative joining rules and the ability to construct a single inclusive polygon. All these are based on the hierarchical incremental nearest-neighbour technique as modified by Kenward et al. (2001). A fifth technique of Objective Restricted-Edge Polygons was added in Ranges 8, to enable a faster analysis in large data sets, encompass voids (holes) in hull-based outlines as pioneered by Getz and Wilmers (2004) and to avoid a cluster-overlap issue. This method addresses distances between neighbouring locations that are potential edges to peripheral outlines or to holes, and adds to a family of Neighbour-linkage polygons. The OREP implementation introduced “Curve and Hole” plotting as an alternative to the original “Corner and Cell” approach.

On the results screens, statistics include the number of range nuclei for each % polygon. In cluster analysis, outliers tend to have a stronger effect on areas than in other analyses, which makes utilisation plots very suitable for identifying range cores by inspection. Inspection and objective coring typically indicate that up to 15% of locations are used for excursive activity, so that 85% polygons often provide convenient cluster core boundaries. Core clusters often remain separate at this point, whereas clusters separated by less than the outlier distances will fuse when all the locations are included.

Statistics are output with column headers as .csv files with column headers that can be double-clicked to open in Microsoft Excel or imported to an alternative spreadsheet.

## Selected cores

This option allows you to examine range structure and to define core areas. By excluding outlying locations the edges enclose areas most used by the animal. See the introduction to Location Analyses for more details.

You can choose one or more values for the percentage of locations or of location density to be included. Type them in ascending order, separated by either spaces or commas.

In the Output Files column you can specify a range areas and statistics output file. The estimates are in column format, suitable for spreadsheets. Each row has the 7 range variables, followed by X,Y coordinates for the range centre, followed by 5 range statistics followed by as many areas as there were core percentages. Structure statistics include, after the area estimate, the number of nuclei in a core, its partial area (the sum of areas of separate polygons / the area of a single polygon round all the clusters). Simpson’s Index for diversity of number of locations across clusters and Simpson’s Index for diversity of area across clusters. This is a .csv file with column headers that can be double-clicked to open in Microsoft Excel or imported to an alternative spreadsheet.

## Cores at 5% intervals

This option provides plots which help to decide which locations are part of a core, and which are outliers. You can choose to save both edge (polygon) and utilisation files. The cores are saved at 5% intervals, from 20-100, a total of 17 sets.

Utilisation files can be plotted in Input & Graphics.

## Objective cores

Rather than choosing a particular core size, it is more scientifically rigorous to have an objective core calculated from the distribution of the locations. The distribution of nearest-neighbour distances can be used for detecting and excluding outlying locations (Kenward et al. 2001) resulting in an objective core. The ways in which outliers are excluded are discussed below. Objective coring sometimes estimates core areas larger than those from an equivalent number of locations in the standard analysis. This is because the standard approach estimates polygons as soon as a required percentage of polygons are included, whereas objective coring continues to merge clusters that are separated by less than the exclusion distance. In these cases, the "single inclusive polygon" option gives the same result for both methods.

## Incremental area analysis

Incremental area analysis is used to answer the question "how many locations do I need to estimate a home range?" Starting with the first three locations (the minimum needed to estimate a polygon area without a boundary strip), the area is re-estimated as each location is added. This permits the consecutive areas, which tend to increase initially as the animal is observed using different parts of its range, to be plotted against number of locations until there is evidence of stability, which indicates that adding further locations will not improve the home range estimate. The default is to plot the edge round all the locations that have been added, but it is also possible to choose a single, smaller core. The consecutive area estimates have to be saved to an output file, so that the result can be examined using Input & Graphics.

## Convex or concave cluster polygons

Concave polygons are offered as well as convex polygons. The edge restriction of concave polygons is based on a fraction of the span of each cluster. The concave polygon option prevents the (very rare) overlap of a small cluster within the limits of a larger, curved cluster.

## Outlier exclusion

If objective cores are selected, exclusion can be of locations in the largest 5% of the nearest-neighbour distance distribution (by analogy with plotting contours or ellipses to 95% of the density distribution), which is the Ranges default. Alternatively, an iterative process excludes the location with the most extreme linkage distance if it is beyond 1%, 0.5% or 0.1% of the distribution estimated by the remainder, and repeats this process until all distances are within the chosen alpha-level on a normal distribution. The 0.1% alpha level excludes only the most extreme outliers. The display shows the Outlier Exclusion Distance (OED) beyond which locations are excluded. In cluster analyses, polygons then plot round clusters with no nearest-neighbour locations beyond this distance. For objective-restricted edges, the exclusion distance has a strip added equivalent to the resolution distances between locations.

## Joining priority

The third addition offers the centroid rule, of joining locations to clusters when all linkage distances are minimal, as a priority over the nearest-neighbour rule, which is then used only as a tie-breaker. Centroid priority suppresses chaining along linear habitats and is thus less appropriate than the nearest-neighbour priority when species are expected to be minimising their travel distances (which is likely the usual circumstance and is therefore the default).

## Separate cluster polygons or Single inclusive polygon

The fourth addition is the option of plotting a single polygon round all the clusters. This excludes locations that are outliers to the main core but includes those between the clusters and which probably represent times when animals were detected on transition between clusters rather than making true excursions. This single polygon, called a "usual area" by Johnstone (1994), may provide a better estimate of a core territory.

## Curve and hole polygons

Although incremental cluster analysis conveniently defines groups of core locations separated from outliers, plotting outlines round clusters can be problematic. Convex polygons around separate clusters occasionally overlap (e.g. if a small cluster occurs within the horns of a crescent-shaped cluster) and simple concave solutions to that overlap problem use arbitrary or subjective edge distances. This problem can be avoided by selecting the curve and hole option, which fuses overlapping polygons.

## Objective-restricted-edge polygons

Objective-Restricted-Edge Polygons depend on curve and hole outlines by default, and are otherwise equivalent to cluster polygons in which a core of locations is defined by an outlier exclusion distance (OED). However, instead of first identifying clusters of locations with the minimal sum of nearest-neighbour distances (which is slow to compute for many locations), OREPs are plotted immediately as concave polygons with an edge restriction based on the OED. OREPs have three advantages over clustering, namely (i) simplicity (hence speed), (ii) polygons cannot overlap and (iii) habitat at all locations is included (because a single grid cell is attributed to outliers beyond the OED).

OREP edge distances can be based either on the distribution of Nearest-Neighbour Exclusion Distances (NNED), as used for objective coring in cluster analysis, or on the distribution of mean distances from each location to all others as estimated in kernel analyses. The Kernel Exclusion Distance (KED) is our default, because it (a) gives more normal distributions than nearest neighbour distances, (b) gives smoother outlines than NNED especially for small samples and (c) is analogous to exclusion of outlier locations to prevent their excessive influence on contours.

Range cores defined by OREPs are equivalent both to cluster analysis with objective coring (Kenward et al. 2001), but without risk of polygon overlap, and also to the fusing of convex hulls based on the sum of distances to nearest neighbour locations which is now the preferred a-LoCoH implementation of Getz & Wilmers (2004, Getz et al. 2007), but with the advantage of an objective choice of the edge-restriction distance. With kernel-based outlier exclusion distances, OREPs unify range analyses based on grid-cell, polygon and location density techniques. However, they are as yet an experimental technique which needs testing to establish whether possible advantages over nearest-neighbour-based techniques with different polygon fusing approaches (as in Ranges clustering and LoCoH implementations) occur for real field data.