Difference between revisions of "Modelling Analysis"

From Anatrack Ranges User Guide
Jump to: navigation, search
(Kaplan Meier survival)
(Kaplan-Meier Survival)
 
(15 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
The initial contribution to modelling is a new approach to analysing resources, such as habitats, which can estimate minimal requirements of individual animals and hence enable individual-based modelling. There is also a method for estimating survival or dispersal rates that is convenient for data from radio-tagging. There are illustrated explanations of both methods in ([[Bibliography|Kenward 2001]]). Further components of a toolkit for modelling will be added to this tab in due course, with the ultimate aspiration of linking these in order to automate population modelling from location data and maps.
 
The initial contribution to modelling is a new approach to analysing resources, such as habitats, which can estimate minimal requirements of individual animals and hence enable individual-based modelling. There is also a method for estimating survival or dispersal rates that is convenient for data from radio-tagging. There are illustrated explanations of both methods in ([[Bibliography|Kenward 2001]]). Further components of a toolkit for modelling will be added to this tab in due course, with the ultimate aspiration of linking these in order to automate population modelling from location data and maps.
  
== Resource area dependency analysis ==
+
== Resource Area Dependence Analysis ==
  
 
The principle that underlies this analysis is that if an animal requires a particular amount of a resource, such as a particular tree or area of habitat, then it will extend its home-range to an extent necessary to contain than amount of resource. If the resource is rarer, range outlines will be larger. In this case, there will be a negative relationship between range area and resource content. For strong resource dependence, the relationship tends to become negative exponential ([[Bibliography|Kenward 1982]]), but is linear with negative correlation if the logarithm of resource content is plotted against the logarithm of range area. Moreover, the range area at a point where the resource proportion is 1 is an estimate of the minimum area of resource required.
 
The principle that underlies this analysis is that if an animal requires a particular amount of a resource, such as a particular tree or area of habitat, then it will extend its home-range to an extent necessary to contain than amount of resource. If the resource is rarer, range outlines will be larger. In this case, there will be a negative relationship between range area and resource content. For strong resource dependence, the relationship tends to become negative exponential ([[Bibliography|Kenward 1982]]), but is linear with negative correlation if the logarithm of resource content is plotted against the logarithm of range area. Moreover, the range area at a point where the resource proportion is 1 is an estimate of the minimum area of resource required.
Line 15: Line 15:
 
=== analysis options ===
 
=== analysis options ===
  
For a rapid examination of whether the prevalence of any of the habitats in a set correlate negatively with range size, analysis of observed values only is appropriate. The statistics available from such are run are the observed value of <i>r</i>, the slope <i>b</i> for the regression of (log) habitat prevalence on (log) range area, the standard error of <i>b</i>, the (log) area intercept <i>c</i> for 100% habitat, the percentage of ranges with no habitat at all in the core, and the percentage with none of the habitat in a particular row. On the graph, the green regression line is for the observed values.
+
For a rapid examination of whether the prevalence of any of the habitats in a set correlate negatively with range size, analysis of <i>observed values only</i> is appropriate. The statistics available from such are run are the observed value of <i>r</i>, the slope <i>b</i> for the regression of (log) habitat prevalence on (log) range area, the standard error of <i>b</i>, the (log) area intercept <i>c</i> for 100% habitat, the percentage of ranges with no habitat at all in the core, and the percentage with none of the habitat in a particular row. On the graph, the green regression line is for the observed values.
  
 
To investigate significance, randomisations are available with 99, 199 and 999 iterations. During randomisation, outlines of all the observed ranges are randomly rotated and displaced within an envelope. By default, that envelope is the minimum convex polygon round all observed outlines for the largest core size among a set of core sizes. <i>N</i> outlines are chosen at random with replacement from the <i>N</i> observed outlines, and an <i>r</i>, <i>b</i> and <i>c</i> calculated in each case.   
 
To investigate significance, randomisations are available with 99, 199 and 999 iterations. During randomisation, outlines of all the observed ranges are randomly rotated and displaced within an envelope. By default, that envelope is the minimum convex polygon round all observed outlines for the largest core size among a set of core sizes. <i>N</i> outlines are chosen at random with replacement from the <i>N</i> observed outlines, and an <i>r</i>, <i>b</i> and <i>c</i> calculated in each case.   
  
Statistics from randomisations include the mean and median values for <i>r</i> by randomisation, <i>z</i> for the difference of this <i>r</i> from the observed<i>r</i>, with associated 95% confidence limits, on the assumption that <i>r</i> is distributed normally. The next value is a more robust test statistic, which is the number of random <i>r</i> values more extremely negative than the observed value. In a two-tailed test, with 999 iterations, a value less than 25 indicates <i>P</i><0.05, with 5 or less for <i>P</i><=0.01 and 0 for <i>P</i><=0.02. There are then mean values for <i>b</i>, its SE and <i>c</i> by randomisation, which are used to plot the yellow line on the graph, and finally the proportion of random placements of the range outlines that lack the relevant habitat. Percentages below the observed percentage of ranges without the habitat indicate non-random placement of observed range outlines with respect to that habitat.  
+
Statistics from randomisations include the mean and median values for <i>r</i> by randomisation, <i>z</i> for the difference of this <i>r</i> from the observed<i>r</i>, with associated 95% confidence limits, on the assumption that <i>r</i> is distributed normally. The next value is a more robust test statistic, which is the number of random <i>r</i> values more extremely negative than the observed value. In a two-tailed test, with 999 iterations, a value less than 25 indicates <i>P</i><0.05, with 5 or less for <i>P</i><=0.01 and 0 for <i>P</i><=0.02. There are then mean values for <i>b</i>, its SE and <i>c</i> by randomisation, which are used to plot the yellow line on the graph, and finally the proportion of random placements of the range outlines that lack the relevant habitat. Percentages below the observed percentage of ranges without the habitat indicate non-random placement of observed range outlines with respect to that habitat.
  
 
=== zero handling ===
 
=== zero handling ===
  
Animals may differ in their use of resources. Some may specialise in a quite different resource to the majority, either through choice or exclusion, so that it does not occur in their range. Excluded animals may have above average range size, in which case addition of a value below other values (which is done automatically), will tend to maintain negative correlations. However, if resource strategy is divergent, inclusion of missing (or very low) proportions of the resource may conceal a major effect. At present, a choice of excluding missing values if possible, with two options; resampling to include zeros is appropriate if it is suspected that large ranges are more likely to include habitat by chance. When there are very low values of resource in some observed ranges, it may in future be possible to exclude these objectively as statistical outliers and then examine these ranges for different resource area dependence relationships.
+
Animals may differ in their use of resources. Some may specialise in a quite different resource to the majority, either through choice or exclusion, so that it does not occur in their range. Excluded animals may have above average range size, in which case addition of a value below other values (which is done automatically for the <i>replace zeros</i> option), will tend to maintain negative correlations. However, if resource strategy is divergent, inclusion of missing (or very low) proportions of the resource may conceal a major effect. At present, a choice of excluding missing values is possible, with two options; <i>resample zeros</i> to obtain resource within all outlines is appropriate if it is suspected that large ranges are more likely to include habitat by chance; otherwise the <i>ignore zeros</i> option will give very similar results but will be faster and will estimate the proportion of randomly-placed outlines that lack the resource. When there are very low values of resource in some observed ranges, it may in future be possible to exclude these objectively as statistical outliers and then examine these ranges for different resource area dependence relationships.
  
 
=== exclude habitats ===
 
=== exclude habitats ===
Line 31: Line 31:
 
=== envelope ===
 
=== envelope ===
  
The default envelope (the minimum convex polygon round all observed outlines for the largest core size among a set of core sizes) may allow very little rotation and displacement of large ranges in a small area, which can greatly slow analyses. If resources have a wide distribution, a large envelope may be used to speed the randomisation, at least for a first quick test, by loading the envelope separately. This is also useful if analysis is focussed in small cores (say, 50% cluster cores), but a polygon around all the locations is being used to standardise the envelope.
+
The default envelope, <i> mcp around max edge file core</i> may allow very little rotation and displacement of large ranges in a small area, which can greatly slow analyses. If resources have a wide distribution, a larger <i>user defined</i> envelope may be used to speed the randomisation, at least for a first quick test, by loading the envelope separately. This is also useful if analysis is focussed in small cores (say, 50% cluster cores), but a polygon around all the locations is being used to standardise the envelope.
  
== Kaplan-Meier survival ==
+
== Kaplan-Meier Survival ==
  
The Kaplan-Meier approach ([[Bibliography|Kaplan & Meier 1958]]), as described for radio-tracking by [[Bibliography|Pollock et al. (1989)]], is provided as a first survival estimation because its interval-based estimation procedure adapts well to the asynchronous entry and departure for unknown reasons that tends to beset groups of radio-tagged animals.  
+
The Kaplan-Meier approach ([[Bibliography|Kaplan & Meier 1958]]), as described for radio-tracking by [[Bibliography|Pollock et al. (1989)]], is provided as a first survival estimation technique. Its interval-based estimation procedure adapts well to the asynchronous (staggered) entries and departures for unknown reasons that are typical for groups of radio-tagged animals.  
  
Example data are the trajectories in first 4 years of life for * buzzards that were tagged in or near their natal nests (<i>***.srv</i> in the folder <i>buzzards</i>). There are also files <i>one.srv</i> and <i>two.srv</i> in the folder <i>squirrel</i>, for two sets of squirrels subject to different short-term control measures as a damage-reduction strategy.
+
Example data are in the folder <i>goshawk</i> from 205 first year goshawks that were tagged in or near their natal nests (<i>Juv_Male.srv</i> and <i>Juv_Female.srv</i>. 
 +
 
 +
=== analysis options ===
 +
 
 +
Choice of <i>one set</i> will run an analysis on one survival file, with a plot that includes error bars for 95% confidence limits based on [[Bibliography|Cox-Oakes (1984)]] variance estimation. The statistics include, for each time interval in the analysis, the number of animals with active tags at the beginning and end of the interval, the number that <i>died</i>, had <i>lost</i> signals for unexplained reasons, were known to have <i>lived</i> through expiry of tag (e.g. due to battery exhaustion) or were added through tag attachment. There is then an estimate of the survival with two types of 95% confidence limits, and the survival decrease since the last period. The numbers in each category are summed at the bottom of the table, with a count of the total number of active tag-days. If the statistics file is saved, it can be opening in Excel or other spreadsheet for <i>.csv</i> files. The <i>.kms</i> graphics file can be opened at a later date in the graphics window.
 +
 
 +
Choice of <i>two sets for comparison</i> will run two plots as above, but also estimate statistics for the comparison between the survival rates. These are log-rank chi-square statistics with one degree of freedom, estimated in progressively more conservative ways, on the penultimate row of the table, and a comparison (see [[Bibliography|Pollock et al. (1989)]] and [[Bibliography|Kenward (2001)]] for further details. The two-sets option enables re-entry of the same file as for the first set, using a second '''[[Selections|Make Selections]]''' button and box to choose a different category of animal (e.g. adult rather than juvenile) within the file.
 +
 
 +
=== time interval ===
 +
 
 +
The length of time intervals for analysis should be great enough to provide opportunity for a number of deaths, but not too long to detect seasonal differences in timing of mortality. A choice of <i>days</i> rather than <i>one month</i> will bring up a box in which the number of days for each interval can be entered. Typically, monthly intervals are selected unless the period to be analysed is less than about 3 months.
 +
 
 +
=== set 1 start date ===
 +
 
 +
Although the default is the <i>first animal start date</i>, this often starts the analysis with too few animals in the first time interval; there should ideally be at least 20, because otherwise the confidence limits will be very large, with a tendency for differences between categories to lack significance. Even when many animals are marked within a short time, there may be a need to delay the start of analysis to exclude animals with possible adversely affects of capture or considered more vulnerable while adjusting to tags. Selecting <i>specified date</i> will bring up a calendar to assist the choice of date.
 +
 
 +
=== set 1 end date ===
 +
 
 +
The default of  <i>last animal end date</i> will often result in very few individuals in the last sample interval, and hence undesirably large confidence limits. It is therefore possible either to set a <i>specified date</i> with a calendar, or to give a <i>duration in days</i> for the analysis.
 +
=== set 2 start date and end date ===
 +
 
 +
For a comparison run, two further option boxes appear. For cases where the time period for comparison is the same in both files, or categories within the same file, it is convenient to be able to choose <i>set 1 start date</i> and <i>set 1 end date</i>, as well as having other options similar to those for the first set of data.
 +
 
 +
=== treat lost as dead ===
 +
 
 +
When carcases are found, the category of <i>died</i> is not hard to assign in survival files. Likewise, when tracking is stopped at a particular date, or tag cell is due to expire the fate category <i>lived</i> can be assigned. However, a problem arises when tracking animals for which deaths are frequently associated with destruction of the radio (e.g. through trauma) or severe loss of signal range or transport of the carcase away from a monitored area. In this case, survival is overestimated by the default of treating the <i>lost</i> signals as tag failure. For conservative estimates of survival during population modelling, it may be most appropriate to treat signals lost before the likely end of tag cell life as if they represent deaths, by ticking this box. The difference in survival estimated by merely censoring the radios will not be large if radios are highly reliable. Correction for“lost” animals that are subsequently retrapped or resighted after the study period can involve reclassifying their fate as <i>lived</i>; more sophisticated correction from such data ([[Bibliography|Kenward 2001]]) will be added in due course.

Latest revision as of 19:08, 11 December 2014

When Ranges 4 was launched in 1990, individual-based modelling of animal populations was in its infancy. However, it was becoming clear that not only was such modelling powerful for predicting population beyond the envelope of conditions in which individuals were measured, but also that radio-tracking could provide the linkages of habitats and sociality with persistence or dispersal, and survival and productivity, that would be needed for modelling. So the provision of a toolkit for modelling was a long-term aspiration for this type of software (Kenward 1992).

Introduction

The initial contribution to modelling is a new approach to analysing resources, such as habitats, which can estimate minimal requirements of individual animals and hence enable individual-based modelling. There is also a method for estimating survival or dispersal rates that is convenient for data from radio-tagging. There are illustrated explanations of both methods in (Kenward 2001). Further components of a toolkit for modelling will be added to this tab in due course, with the ultimate aspiration of linking these in order to automate population modelling from location data and maps.

Resource Area Dependence Analysis

The principle that underlies this analysis is that if an animal requires a particular amount of a resource, such as a particular tree or area of habitat, then it will extend its home-range to an extent necessary to contain than amount of resource. If the resource is rarer, range outlines will be larger. In this case, there will be a negative relationship between range area and resource content. For strong resource dependence, the relationship tends to become negative exponential (Kenward 1982), but is linear with negative correlation if the logarithm of resource content is plotted against the logarithm of range area. Moreover, the range area at a point where the resource proportion is 1 is an estimate of the minimum area of resource required.

Another important consideration is that a single patch of habitat enclosed within range outlines of varying size will show a negative relationship of proportion with area by chance. To avoid misinterpretation of random events, the significance of observed correlations should be compared with random range placement in the same areas. In the case of a single resource, its occurrence significantly more frequently in observed ranges than in random placements may be the best indication of its importance.

This analysis requires an edge file and a habitat file. Suitable example files are in the folder squirrel, as described for habitat analysis.

analysis options

For a rapid examination of whether the prevalence of any of the habitats in a set correlate negatively with range size, analysis of observed values only is appropriate. The statistics available from such are run are the observed value of r, the slope b for the regression of (log) habitat prevalence on (log) range area, the standard error of b, the (log) area intercept c for 100% habitat, the percentage of ranges with no habitat at all in the core, and the percentage with none of the habitat in a particular row. On the graph, the green regression line is for the observed values.

To investigate significance, randomisations are available with 99, 199 and 999 iterations. During randomisation, outlines of all the observed ranges are randomly rotated and displaced within an envelope. By default, that envelope is the minimum convex polygon round all observed outlines for the largest core size among a set of core sizes. N outlines are chosen at random with replacement from the N observed outlines, and an r, b and c calculated in each case.

Statistics from randomisations include the mean and median values for r by randomisation, z for the difference of this r from the observedr, with associated 95% confidence limits, on the assumption that r is distributed normally. The next value is a more robust test statistic, which is the number of random r values more extremely negative than the observed value. In a two-tailed test, with 999 iterations, a value less than 25 indicates P<0.05, with 5 or less for P<=0.01 and 0 for P<=0.02. There are then mean values for b, its SE and c by randomisation, which are used to plot the yellow line on the graph, and finally the proportion of random placements of the range outlines that lack the relevant habitat. Percentages below the observed percentage of ranges without the habitat indicate non-random placement of observed range outlines with respect to that habitat.

zero handling

Animals may differ in their use of resources. Some may specialise in a quite different resource to the majority, either through choice or exclusion, so that it does not occur in their range. Excluded animals may have above average range size, in which case addition of a value below other values (which is done automatically for the replace zeros option), will tend to maintain negative correlations. However, if resource strategy is divergent, inclusion of missing (or very low) proportions of the resource may conceal a major effect. At present, a choice of excluding missing values is possible, with two options; resample zeros to obtain resource within all outlines is appropriate if it is suspected that large ranges are more likely to include habitat by chance; otherwise the ignore zeros option will give very similar results but will be faster and will estimate the proportion of randomly-placed outlines that lack the resource. When there are very low values of resource in some observed ranges, it may in future be possible to exclude these objectively as statistical outliers and then examine these ranges for different resource area dependence relationships.

exclude habitats

As in analyses of habitat preference, disproportionate use of one relatively abundant resource can conceal a dependence also on one or more uncommon resources. This effect can be avoided by removing the area of the first resource from the range and then re-analysing for the second, in a step-wise approach. Resource exclusion of this type is supported in Ranges 9. The Ctrl key can be held to select multiple habitats to exclude, and is also required to remove previous selections.

envelope

The default envelope, mcp around max edge file core may allow very little rotation and displacement of large ranges in a small area, which can greatly slow analyses. If resources have a wide distribution, a larger user defined envelope may be used to speed the randomisation, at least for a first quick test, by loading the envelope separately. This is also useful if analysis is focussed in small cores (say, 50% cluster cores), but a polygon around all the locations is being used to standardise the envelope.

Kaplan-Meier Survival

The Kaplan-Meier approach (Kaplan & Meier 1958), as described for radio-tracking by Pollock et al. (1989), is provided as a first survival estimation technique. Its interval-based estimation procedure adapts well to the asynchronous (staggered) entries and departures for unknown reasons that are typical for groups of radio-tagged animals.

Example data are in the folder goshawk from 205 first year goshawks that were tagged in or near their natal nests (Juv_Male.srv and Juv_Female.srv.

analysis options

Choice of one set will run an analysis on one survival file, with a plot that includes error bars for 95% confidence limits based on Cox-Oakes (1984) variance estimation. The statistics include, for each time interval in the analysis, the number of animals with active tags at the beginning and end of the interval, the number that died, had lost signals for unexplained reasons, were known to have lived through expiry of tag (e.g. due to battery exhaustion) or were added through tag attachment. There is then an estimate of the survival with two types of 95% confidence limits, and the survival decrease since the last period. The numbers in each category are summed at the bottom of the table, with a count of the total number of active tag-days. If the statistics file is saved, it can be opening in Excel or other spreadsheet for .csv files. The .kms graphics file can be opened at a later date in the graphics window.

Choice of two sets for comparison will run two plots as above, but also estimate statistics for the comparison between the survival rates. These are log-rank chi-square statistics with one degree of freedom, estimated in progressively more conservative ways, on the penultimate row of the table, and a comparison (see Pollock et al. (1989) and Kenward (2001) for further details. The two-sets option enables re-entry of the same file as for the first set, using a second Make Selections button and box to choose a different category of animal (e.g. adult rather than juvenile) within the file.

time interval

The length of time intervals for analysis should be great enough to provide opportunity for a number of deaths, but not too long to detect seasonal differences in timing of mortality. A choice of days rather than one month will bring up a box in which the number of days for each interval can be entered. Typically, monthly intervals are selected unless the period to be analysed is less than about 3 months.

set 1 start date

Although the default is the first animal start date, this often starts the analysis with too few animals in the first time interval; there should ideally be at least 20, because otherwise the confidence limits will be very large, with a tendency for differences between categories to lack significance. Even when many animals are marked within a short time, there may be a need to delay the start of analysis to exclude animals with possible adversely affects of capture or considered more vulnerable while adjusting to tags. Selecting specified date will bring up a calendar to assist the choice of date.

set 1 end date

The default of last animal end date will often result in very few individuals in the last sample interval, and hence undesirably large confidence limits. It is therefore possible either to set a specified date with a calendar, or to give a duration in days for the analysis.

set 2 start date and end date

For a comparison run, two further option boxes appear. For cases where the time period for comparison is the same in both files, or categories within the same file, it is convenient to be able to choose set 1 start date and set 1 end date, as well as having other options similar to those for the first set of data.

treat lost as dead

When carcases are found, the category of died is not hard to assign in survival files. Likewise, when tracking is stopped at a particular date, or tag cell is due to expire the fate category lived can be assigned. However, a problem arises when tracking animals for which deaths are frequently associated with destruction of the radio (e.g. through trauma) or severe loss of signal range or transport of the carcase away from a monitored area. In this case, survival is overestimated by the default of treating the lost signals as tag failure. For conservative estimates of survival during population modelling, it may be most appropriate to treat signals lost before the likely end of tag cell life as if they represent deaths, by ticking this box. The difference in survival estimated by merely censoring the radios will not be large if radios are highly reliable. Correction for“lost” animals that are subsequently retrapped or resighted after the study period can involve reclassifying their fate as lived; more sophisticated correction from such data (Kenward 2001) will be added in due course.