\documentclass[twocolumn,letterpaper,dvips]{article} \pagestyle{myheadings} %\input{seteps}
\usepackage{mcfns} %using mcfns.sty version 9 21jan09 -- NOTE the MCFNS.STY variables that have to be updated below
\usepackage{amsmath,bm,url} \usepackage[dvips]{graphicx} \usepackage{setspace,lineno} \usepackage[sort]{natbib}
\usepackage[breaklinks,pdfstartview={FitH -32768},pdfborder={0 0 0},bookmarksopen,bookmarksnumbered]{hyperref} %\usepackage{bibtexlogo}
%______________________________________________________________________
% Define MCFNS variables:
\setcounter{page}{47}
\def\editors {\href{mailto://c@mcfns.com} {editor:~Chris~J.~Cieszewski}}
\def\submit {Feb.~24,~2009} %Submission date can be different than the issue year \issueyear
\def\accept {Aug.~12,~2009} %The works should be Accepted & Published in the year of the Current_Issue \issueyear
\def\lasterrata {Aug.~28,~2009} %Last Errata date can be different than the Issue-Year \issueyear
\def\citename {Iles} %"Author" or "FirstAuthor et al."
\def\citeemail {kiles@island.net} % Use later: {\href{mailto://\citeemail}{\citename}}
\def\citeetal {} % or {} %for a single author; or
\author{ {\href{mailto://\citeemail}{Kim \citename}}
}\affiliation{ \small\it{{\href{http://www.island.net/~kiles/}{Kim Iles {\&}
Associates Ltd., 412 Valley Place, Nanaimo, BC, Canada. Ph.\&FAX:\,250.753.8095}}}
}\def\yourtitle{{\Large\uppercase{\bf ``Nearest-tree'' estimations}} \\
{\normalsize{A discussion of their geometry}}} %need double {{ for \\ e.g.: {{Title \\ Subtitle}}
\def\yourkwords {Unbiased\:methods, total-balancing, data\:adjustment, forest\:inventory, sampling\:methods}
\def\yourabstract{
The use of ``nearest-neighbor'' sampling has a long history. It involves measuring the distance from a random point in an area to the nearest object. That history involves never quite solving the problem, many examinations of special cases that never occur, adjustments that were ad-hoc, and a great deal of uninformative algebra. In forestry we have attempted to use the ``nearest-tree'' method for estimating numbers of trees on a landscape but the method is general, and can be used for any objects being sampled.
I believe that the literature has never shown the logic and geometry in a form that is useful to both understand and solve the problem. This paper discusses the method from the geometric point of view, making no assumptions about tree distribution, and shows why extending the processes to the ``n$^{th}$ closest tree'' much reduces the bias and variability, as well as specifying what is needed to solve the problem in an unbiased way.
}%----------------------------------------------------------------------
% put any of your personal LaTeX definitions etc here.
\newcommand{\der}[2]{\frac{{\mathrm d}#1}{{\mathrm d}#2}}
\newcommand{\dr}[2]{{\mathrm d}#1/{\mathrm d}#2}
\newcommand{\dd}{\,{\mathrm d}}
\bibliographystyle{SAF}
\bibpunct{(}{)}{,}{a}{}{,}
% THE REST SHOULD BE AUTHOMATIC ... Go To the first Section ...
\title{\Large\bf\uppercase\yourtitle}
\begin{document} \markright{\hfil{{{\href{mailto://\citeemail}{\citename}\citeetal}}~(\issueyear)/\mcfnshead}} \twocolumn[ \begin{@twocolumnfalse}
\maketitle \hypersetup{pdftitle={\mcfnshead}, pdfauthor={\citename~(\issueyear)}, pdfsubject={\yourtitle}, pdfkeywords={\yourkwords}} \hrule
\begin{abstract}\yourabstract\\\\{\bf Keywords:}~\yourkwords \end{abstract}\hrule\vspace{.3truein}\end{@twocolumnfalse}]
%\numberwithin{figure}{section}
% Continue with the first Section:
\section{Background}
For more than half a century, the idea of measuring distance from a random
point to the nearest object has been developed. It has often been reviewed
in the sampling literature, for instance in books by Pielou (1977), and
Bonham (1989). Most of the history of the subject seems to have been
developed by ecologists or the mathematicians to whom they brought the
problem.
My own interpretation of the method is that it developed roughly as follows:
1) We can see that the average distance to objects, trees for instance,
clearly decreases when more objects are added to a fixed tract area --
especially if the trees are not extremely clustered. Therefore, distances
between random points and objects could be used to estimate the density
(meaning objects per unit of land area -- tree stems in this~case).
2) As with many sampling systems, they looked at estimators based on a
random distribution, even though this was clearly wrong. Generally, the area
around each tree was computed using the distance to the nearest tree
(r$_{i})$ by an equation known to be unbiased with a random distribution,
then averaged to give area~A$_{t}$. This area around the tree was then used
to compute the number of trees in an area as follows:
\[N = \left( {\frac{{\text{Tract area}}}{{ A}_{t} }} \right)\]
This was highly satisfying for random distributions, although the
mathematical proof of such a thing was not easy to follow or explain. Having
the equation was enough.
3) A feeling of guilt developed in the ecological circles, since everyone
knew that trees and other objects were not randomly distributed. No
theoretical approach suggested itself, so a period of simulation followed
and examined quite a variety of estimations using the distance (r$_{i})$,
such as detailed in Engeman (1994). As in all simulations, it was never
``done in our own backyard'' so any correction constants could not be
trusted - no matter how interesting they might be.
Even with no bias, the method will typically give an answer that is too low.
This is because of a high variability when some distances to the tree are
very short and therefore give very large individual estimates of N. Although
these few very large estimates make the system unbiased, they happen rarely
enough that the median answer is typically too low. In this case it is
arguably wise to use a biased estimate, which gives a smaller actual error
in most cases, and just live with the bias.
4) The problem was extended, in hopes that the variability and any perceived
bias would go away. Samplers looked at the 2$^{nd}$-closest tree, the
3$^{rd}$, and generally the ``n$^{th}${\-}closest tree'' hoping that the
bias would asymptotically go away, and indeed that seemed to be the case.
5) At several times people realized that this was really a problem of
deducing the area of the average Voronoi polygon around individual trees.
Once you had that area, of course, that puts you into the well known realm
of Horvitz-Thompson estimators and simplifies everything. A Voronoi polygon
is the area around a tree where it is the ``closest'' tree to any point in
the polygon. In fact, the situation could be examined with any shape of
polygon around trees, provided that the polygons tessellated the area and
you could tell which polygon you fell into with a sample point. Voronoi
polygons are simply a very convenient situation to consider.
I have never been able to find a simple procedure for calculating the
Voronoi polygon area around a single tree while in the field. Solving the
problem for thousands of trees with XY coordinates is easily done and quite
efficient by computer algorithms, and you would think that perhaps a simple
Excel program must be available to do this in the field using angles and
distances to trees. I have not been able to find such a program.
I would suggest that perhaps this is one of those times when we could look
at the geometry of the situation and perhaps gain some insight. Before
samplers found out that calculus was so impressive to journal editors, they
would reason out the geometry of various situations and sometimes came up
with some inspired results. Consider Walter Bitterlich's development in the
1940's of Angle-Count Sampling (typically called Variable Plot Sampling) as
one example (Bitterlich, 1984). He developed this as a geometry exercise,
and it changed forest sampling worldwide. Perhaps this is another example
that might benefit from such an approach. Geometric proofs are, after all, a
valid type of proof. They are every bit as mathematical as an algebraic or
calculus approach, and can be much more illuminating.
\section{The Geometry}
Consider, first, the geometry of selecting a tree linearly ``closest'' to a
random point. Clearly this is a question of falling within a Voronoi polygon
in which that tree is ``nearest''. Where other definitions of ``closest''
are considered, the geometry remains very similar and the solutions here are
basically unchanged.
The average area of such polygons provides the key to estimating the number
of trees per hectare. A random point in the area is always located in one
and only one of these polygons, and falls within those polygons with
probability proportional to their area. Figure~\ref{fig1} illustrates this
situation.
\begin{figure}[htbp]\vspace{.8in}
\leftline{\includegraphics[width=3.25in,height=2.5in]{nn-fig1.eps}}\vspace{-.8in}
\caption{The ``nearest-tree'' Voronoi polygon, which is sampled, proportional to its size, by a random point.}
\label{fig1}
\end{figure}
\section{The Problem}
The question is: how can we estimate the polygon area by only using a linear
distance? If we could detect the distance from the tree to the \textit{edge} of this
polygon, a solution becomes fairly simple, and the variability of the
estimator is much reduced. Consider the distance R$_{i}$, which is from the
tree to the \textit{edge} of the polygon. The edge is recognized because
it is the point where one or more other trees are the same distance from
tree i. A shorter distance r$_{i}$ has traditionally been used as the
distance from a random point to the tree. The larger distance R$_{i}$ is the
distance from a tree (or more generally any fixed point) to the \textit{edge} of the
polygon. This distance has some very fortunate characteristics.
One of the examples in some calculus courses is to establish that the
quadratic average (R$_{a})$ of the distances R$_{i}$ chosen with equal
probability from any fixed point (for instance the tree in the polygon) is
equal to a circle having radius R$_{a}$ with exactly the same area as that
irregular polygon. This was discussed by Matern (1956), and more recently by
Gregoire and Valentine (1995). The polygon does not need to have straight
edges for this; but it does in our case, because the edges are formed from
bisectors of adjacent trees. For nearest-tree situations it is a very simple
polygon with a reference point (the tree) which is easy to identify.
\begin{figure}[h]\vspace{1in}
\centerline{\includegraphics[width=3.5in,height=2.75in]{nn-fig2.eps}}\vspace{-1in}
\caption{A circle having a radius (R$_{a})$ equivalent to the quadratic
average of all possible distances $\sqrt {\frac{\sum {R_i ^2} }{n}} $, has
an area equal to the area of that irregular polygon.}
\label{fig2}
\end{figure}
This also leads to the estimate:
\[\left(R_{a}^{2}\times\pi\right) = \text{polygon area}\].
If we simply use $\left(R_{i}^{2}\times\pi\right)$ in each case, and then
average the areas of these circles, we get an unbiased estimate of
$\left(R_{a}^{2}\times\pi\right)$ for polygon area. In other words, we simply
treat the distances (R$_{i})$ as circle radii, and average those circle
areas. We can use this simple arithmetic average because the angular
direction from the tree to the polygon edge was randomly chosen with equal
probability.
If the ray outward from the tree was not randomly chosen (such as when it
was chosen by going through a random point) we would have to weight the
individual distances to compute the same expected value. Here again, we have
only to refer to previous work. Walter Bitterlich taught foresters how to
select circles proportional to their area and how to use the results. This
simple geometry problem was solved by using an angle gauge to choose trees
at a random point. A random point chooses the larger circles (radii) by the
square of the radius involved.
If we wanted to have the \underline {arithmetic} average of the radii
\underline {as if} the radii were chosen equally, the first suggestion for
this seems to have come from Hirata (1956). We simply take the harmonic mean
of the squared radii, because the weighting of their selection was made with
probability proportional to the squared distance. It is easy to imagine the
weight being proportional to a small wedge extending from the tree outwards,
so the probability of a point falling into this area is proportional to the
square of the distance:
\[R_{a}=\sqrt {\;\frac{1}{\;\left( {\frac{\sum {\left[
{\frac{1}{{R}_{i} ^{2}}} \right]} }{n}} \right)\;}\;} \].
This is the unbiased estimate of the arithmetic mean of \underline {equally}
chosen radii, even though the radii used in this computation were chosen
proportional to their squared length by going from the tree through a
randomly chosen point in the polygon.
How could we do this in the field? Our problem is simply to sample for the
average circle area $(R_{a}^{2}\times\pi )$ using distances from the
tree to the polygon edge. One way to do this~is:
\begin{itemize}
\item[1)] Select a random point and go to the nearest tree.
\item[2)] From the tree, select a random angle, and go in that direction until the
edge of the polygon is encountered. This is the first point where another
tree would be equally far away (R$_{i})$.
This ``random direction'' step can be skipped if you use the harmonic mean
just described, in which case the distance R$_{i}$ is from the tree through
the sample point to the edge of the polygon. This simplifies field work.
\item[3)] Measure R$_{i}$, as an estimate of a circle radius equal to the
polygon~area. The average of these squared radii (weighted harmonically, if
necessary) is R$_{a}^{2}$.
$(R_{a}^{2}\times\pi )$ then estimates the average polygon area around
individual trees.
\item[4)] From this, the number of trees/ha can be calculated.
\end{itemize}
Other estimates of volumes, values and other characteristics are similarly
best imagined geometrically, but will be more fully described in future
papers. To those who are familiar with Variable Plot sampling, these are
easily imagined as Volume to Basal Area Ratios (VBARs). When averaged, these
can simply be multiplied by stand area in order to produce totals for the
tract.
It is relatively easy to do such calculations. The main deviation from
previous work comes from viewing the problem as a sample of various size
circles, rather than using any assumption at all about tree distribution.
Note that there is absolutely no restriction at all on the distribution of
trees.
At this point, we have the same form of equation as has always existed for
point to plant areas and numbers per hectare. The only difference is that
the circle area derived from point to the plant distances (r$_{i})$ was
doubled. This was used because when we choose a random point in a circle,
the average area of $(r_{i}^{2}\times\pi )$ is
$\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/
\kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} $ of the area
$(R_{i}^{2}\times\pi )$.
The problem with this historical estimator is that it is quite variable. The
distance to objects can obviously be very small, and when dealing with
reciprocal squares this caused high variability. There was a way to solve
this problem, and it involved going beyond the nearest tree, which might be
quite close, and going to the 2$^{nd}$, 3$^{d}$ or more generally the
``n$^{th}$ nearest-tree''.
This problem is also best viewed as a geometry problem. Although some have
apparently viewed this distance to the n$^{th}$ tree as a kind of ``plot
radius'' (Lynch, 2003), I believe that this does not provide insight into
the actual geometry. Lately, several authors, for instance Magnussen (2008)
and Kleinn (2006) deduced that this involved ``order-k'' Voronoi polygons
(Okabe, et al, 1999, chapter 3, page 152) around trees, but admitted that
calculating these in the field was even more impractical. Indeed, measuring
the polygon is too awkward to contemplate, but sampling for it is not. Using
many trees in a larger polygon, rather than just one n$^{th}$-nearest
tree certainly complicates imagining the geometry.
If you consider the Voronoi polygons around the nearest tree, and ask what
is the polygon where this tree is the ``2$^{nd}$-nearest-tree'', the
geometry begins to clear up. In this illustration, I have taken the polygons
for trees bordering an example tree and calculated the parts of these where
the example tree is the 2$^{nd}$-nearest-tree. This can be done by hand,
and I am sure that it could be done quickly and more accurately by a GIS
system, which would be especially necessary for larger
``n$^{th}$-nearest'' situations. The analysis depends upon individually
eliminating trees, then dividing that tree's polygon among other trees. This
process adds what I will call ``slivers'' along the edge each of the
original Voronoi polygon. It is within these slivers that the tree is the
2$^{nd}$-nearest-tree. You do not need to know this area or the
boundaries, because you know when you fall into that polygon (because that
example tree is the second closest), but visualizing the geometry reveals a
lot about why it works well.
The graph from this process produces a ``halo'' (Fig.~\ref{fig3}) of slivers which surrounds each tree. Here they are illustrated around just one of the trees.
The consequence of this is that the starting point for the distance to the
``n-th nearest-tree'' must lie within these slivers. The \underline {outer}
border of the halo forms a new polygon consisting of the inner original
polygon plus the added slivers. On the average, these larger polygons are
exactly twice the size of the inner polygons describing the nearest trees.
The smaller slivers add up to exactly the tract area, and are all allocated
to one and only one tree. The interior parts of the original
``nearest-tree'' polygon would be divided into slivers which would select
some other tree as the 2$^{nd}$-closest. Therefore, the original polygons
\underline {plus} the sliver areas that border them amount to twice the area
of the tract, and with the same number of trees those polygons have an
average exactly twice as large.
We therefore have the same solution as before. If we measure the radius to
the edge of this larger polygon, then calculate the average area, we will
estimate twice the area of an average nearest-tree polygon. The same process
is used, but the area is just divided by two before you calculate numbers of
trees. The same reasoning, of course, applies to the 3$^{rd}$, 4$^{th}$ or
n$^{th}$ closest tree. The halos of slivers get thinner, and occur at
greater distances from the tree. The slivers tessellate the area as if they
were a large stained-glass window with interlacing halos of different
colors, each assigned to different trees.
What we would prefer is the distance from the tree to the edge of this
larger polygon (R$_{i})$, but the simple distance from the random point to
the tree (r$_{i})$ is at least restricted by the width of the slivers along
the border of the polygon. This shorter distance, if used directly, would
lead to an estimate of an average polygon which is too small, and therefore
would estimate too large a number of trees. Although any bias from using
r$_{i}$ rather than R$_{i}$ may be smaller, and although it reduces as we go
to the 4$^{th}$, 5$^{th}$, 6$^{th}$ tree and so on, we would prefer the
distance R$_{i}$ because it is unbiased. To find the actual polygon edge of
the larger polygon we should back away from the tree until it ties with
another tree as the ``n$^{th}$ closest tree''.
\begin{figure}[t]\vspace{1in}
\leftline{\includegraphics[width=3.75in,height=3in]{nn-fig3.eps}}\vspace{-1in}
\caption{A ``halo'' of slivers forms along the border of the original
polygon to indicate where it would be chosen as the ``second-closest'' tree.}
\label{fig3}
\end{figure}
\addtolength{\textheight}{-1truein}
The bias caused by using a shorter distance (r$_{i})$ has caused some to
suggest that an additional distance be added to each measurement, which can
reduce the bias. This was usually visualized as using a slightly larger
``fixed plot'' with the n trees inside it, since the distance barely
includes the n$^{th}$ tree. I do not think that this view is useful for
understanding the process, but some adjustment would clearly help to reduce
the bias.
When using the n$^{th}$ closest tree approach the variability has been
reduced, and at some point the bias becomes negligible because these slivers
are too slim to create a great deal of difference in the distance to the
sample point versus the correct distance to the polygon edge. It is a
classic trade{\-}off, an unbiased method that is more awkward in the field
compared to a biased estimate that is relatively stable and has simple field
measurements.
I must admit to being one who would use the biased method. On the other
hand, what would happen if we had a simple instrument or method that would
tell us when we crossed that invisible boundary where the tree went from the
n$^{th}$ nearest to the (n+1)$^{th}$ nearest? We would then have an unbiased
system with desirable variability characteristics. All we need to be aware
of this possibility is to view the geometry in such a way as to see the
actual situation. Bitterlich found a way to tell when he was inside an
invisible circle that was a multiple of the stem area without distance
measurements or calculations, by simply using an angle to view the tree.
When we look at the nearest-tree process as a geometry exercise, perhaps
someone else will show similar ingenuity. There are obvious extensions of
this geometric view to other items besides simple tree numbers. I think that
this view is general, useful, and puts the mathematics into context in a way
that pure mathematical approaches do not.
It was a large breakthrough when the scientific community discovered the
concept of analytic geometry. Have we forgotten the geometry part of that
insight? I think that perhaps we have. The reason that this problem has
essentially gone unsolved for so very long is that it does not yield readily
to a purely mathematical solution without the geometrical insight. Variable
Plot sampling was an enormous breakthrough in forest sampling. I believe
that this was because it was essentially a geometrical problem solved by a
geometrical insight. I think the nearest-neighbor problem is the same, and
that there are still many problems like these.
\section*{Acknowledgements}
I want to acknowledge the encouragement of the late Dr. Al Stage, who made
me promise to eventually publish this talk, first presented at a conference
in 2003 (``\textit{A General solution to the `nearest neighbor' sampling problem}'', Western Mensurationist Meeting). I would also like to thank
several anonymous reviewers who detected typos in the draft manuscript.
\section*{References}
\begin{description}
\item Bitterlich, W. 1984. {The Relascope Idea}, Commonwealth Agricultural Bureaux, 242 pages, ISBN
0-85198-539-4, see pages 2-6.
\item Bonham, C.D. 1989. {Measurements for Terrestrial Vegetation}, John Wiley and Sons, ISBN 0-471-04880-1, 338 pages (see pages 148-154).
\item Engeman, R.M., R.T. Sugihara, L.F. Pank, and W.E. Dusenberry. 1994. {A Comparison of Plotless Density Estimators Using Monte Carlo Simulation}, Ecology 75(6):1769-1779.
\item Gregoire, T. G. and H. T. Valentine. 1995. {A sampling strategy to estimate the area and perimeter of irregularly-shaped planar regions}. Forest Science 41:470-476.
\item Hirata, T. 1956. {Harmonic means in Bitterlich's sampling}, University of Tokyo, For. Misc. Inf. {\#}11, 9-14 (not directly examined by author, citation via Bitterlich, see Bitterlich pages
191 and 233).
\item Kleinn, C., Frantisek V. 2006. {Design-unbiased estimation for point-to-tree distance sampling}, Canadian Journal of Forest Research 36(6):1407-1414(8).
\item Lynch, T.B.,~ R.F. Wittwer. 2003. {n-Tree distance sampling for per-tree estimates with application to unequal-sized cluster sampling of increment core data.}~Canadian Journal of
Forest Research,33(7):1189-1195.
\item Magnussen, S., C. Kleinn, N. Picard. 2008. {Two new density estimators for distance sampling}, European Journal of Forest Research, Volume 127 (3):213-224(12).
\item Matern, B. 1956. {On the geometry of the cross-section of a stem}, Meddelanden Fr{\aa}n Statens
Skogsforskningsinsitute. Stockholm, 46.
\item Okabe, A., B. Boots, K. Sugihara, and S.N. Chiu, 1999. {Spatial tessellations: concepts and applications of Voronoi diagrams}, 2$^{nd}$ Edition, John Wiley {\&} Sons, New York.
\item Pielou, E.C. 1969. {An introduction to Mathematical Ecology}, John Wiley and Sons, 286 pages, SBN 471 68918 1 (see pages 111-123).
\end{description}
\label{docend}
\end{document}