[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sdpd] Re: UPPW-5 solution - UPPW-6 problem



Hi Alan (and sdpd-ers)

I'd like to respond to some of the points raised in your email:

1)

> it is my opinion that a combined ITO and DICVOL probably solves more
> than 90% of every thing thrown at it. Thus new algorithms are only
> filling a small gap

Although I respect and use both ITO and DICVOL (and had a hand in the
development of the former), in my experience this statement claims too
much for them.

It was more nearly true 10 years ago when sdpd addressed rather simpler
problems, but programs like those were never designed to tackle the more
difficult and complex (but generally better measured) datasets that 
modern lab diffractometers and synchrotrons are producing.

Thus, for example, ITO is reasonably tolerant of modest amounts of 
impurity lines, but not of dominant zones, while DICVOL is liable to 
struggle with today's complex high-volume datasets and can be completely 
blocked by the presence of a single unidentified impurity line.

That's one reason why the Crysfire suite contains many more than the
classic trio of ITO, DICVOL and TREOR among its repertoire of 10+
supported indexing programs - so that, for example, I would advise
anyone using it always to routinely run also at least KOHL and LZON.

These new and more ambitions problems have much to do with why there is
a demand for new global optimisation and joint probability methods, such
as SVD-index, McMaille, AUTOX and Hmap (with more in the pipeline)

2)

> I have heard the term "exhaustive" used so much in indexing that I am
> beginning to believe that I got something wrong. So please enlighten me
> if you can.

An exhaustive search is one that definitively reports all the possible
solutions within its search domain, so that one never has to search that
particular domain again.

Here are some examples of methods that are exhaustive (though this is 
not necessarily exactly true of all their implementations in programs):

a) Binary search = successive dichotomy.

Binary search requires an existence test for solutions, usually provided
by specifying hard +- limits in 2Theta or d-spacing for each observed
line (peak) position.  A binary search with a particular dataset and
specified +- line-position limits, and specified cell parameter and
volume limits, should be completely definitive and should never require
repeating.

In my experience, DICVOL and LOSH (and hence LZON) approach these
requirements but do not completely achieve them, since from time to time
I have found them to produce hopefully small logical inconsistencies.

For example, solutions which are known from other methods to be present
within their declared search space are sometimes not reported by these
programs.  Similarly, sometimes solutions that are reported by one run,
are not reported by another run where the conditions are slightly
different, though apparently not enough to account for the different
behaviour.

I haven't investigated the reasons for these occasional anomalies, but
assume that they arise as subtle side effects of internal optimisations
made by the author(s) to improve execution speed. 

I can't comment on the behaviour of X-Cell, which is also reported to
use dichotomy (binary search), since it is a commercial program that I
haven't personally used.

b) Index permutation

Full index permutation in index space should also be exhaustive, within 
its declared bounds.

Taupin's program (TAUP [=Powder]) makes quite a close approach to this, 
though with a limited set of base lines.  A consequence is that it can 
become incredibly slow in low symmetry.

c) Grid search

Grid search methods are formally exhaustive for the particular
grid-point array used, and become fully exhaustive (within their various
other bounds, such as limits on cell volume, Miller indices, etc.) if
the step size is made sufficiently small.

Whether they will succeed in exhaustively flushing out all the solutions 
present within their search space depends also on the power of the merit
criteria used.

My Mmap and Hmap programs achieve this reasonably reliably for the
2-dimensional (most-dominant-zone-based) SIW sections of solution space
for which they are designed.  Their successor PEURIST, when it eventually 
appears, will incorporate an extension of these methods.

Another grid-search program, which operates directly in up to three
dimensions (i.e. up to orthorhombic) is SCANIX by Wojtek Paszkowicz.

Grid search is inherently a relatively inefficient method (though its
calculations sometimes incorporate sophisticated optimisations), but it
can be very robust.

To summarise:

Truly exhaustive methods do exist, and there are already a number of
programs which come close to implementing them.  However, indexing's
solution space is simply too large in low symmetry for us to rely on
them exclusively.  There are reasons to suspect that the optimisations in
some of the existing programs can also occasionally produce logical
inconsistencies that make them fall short of complete exhaustiveness.

Perhaps, now that processor power is far more accessible than when those
programs were written, re-implementations will be developed ab initio
which, though perhaps not as fast, do not contain such compromises.

3)

> in my view an iterative least squares estimate between the observed
> and calculated d-spacings is the best solution choice within a
> particular range.

I'd prefer to substitute "2Thetas" for "d-spacings" in that sentence, at
least for data obtained with angular dispersion instruments.

Least-squares refinement vs 2Theta (as with Celref within Chekcell) may
not produce as high figures of merit as LS refinement vs d (or, more
usually, Q=1/dsq), but it is likely to produce a cell that more nearly
approaches physical reality.

Refinement against 2Theta in expanding shells of 2Theta (and hence d*) is
also particularly powerful at releasing a trial solution that is stuck in 
a local minimum reasonably close to the physical solution.

I'd add that there are circumstances in which least-squares itself can
become unstable.  In such situations one can still fall back on parabolic
refinement against a merit surface (the 3-point fitting of a parabola
cyclically to each variable parameter in turn, until convergence is
reached).  This is a slower but incredibly robust and general method,
which can often succeed when others fail.

4)

> it's a lot of fun in any case.

You bet!  At least if one enjoys exploring really knurly search spaces 
that can contain tens of thousands of possible solutions!

With best wishes to all UPPW indexers and spectators

Robin Shirley

-----------------------------------------

To:            sdpd...@yahoogroups.com
From:          Alan Coelho <alan.coelho...@attglobal.net>
Date:          Tue, 25 Nov 2003 16:43:44 +0100
Subject:       [sdpd] Re: UPPW-5 solution - UPPW-6 problem
Reply-to:      sdpd...@yahoogroups.com

hello to all

First I would like to praise Armel for bringing indexing into the 
spotlight. Now I would like to comment on Armel's statement "Have the 
most recent indexing software outmatched the old established ones ? 
Perhaps, hard to say".

I don't think that these examples are going to show up progress on 
whether new algorithms succeed; they do however show when they fail. I 
do enjoy the challenge so don't stop Armel.  The reason for being 
pessimistic is the fact that if a new method does find a correct 
solution to a difficult unknown then who is to know if the correct 
solution was indeed found. As much as I think that "real" data is 
necessary for testing methods I do think that there is no substitute for 
simulated test data where the solutions are known. There is also no 
substitute for understanding the methods rather than trusting their 
implementations.

UPPW-5 is a case where powder data does not yield a unique solution. 
When multiple solutions yield similar "perfect" Pawley/Le Bail fits with 
similar de Wolff  values then it is not a matter of failure of the 
programs/methods but rather a failure of the data to yield a unique 
solution. In my view it is therefore not possible for any indexing 
method to resolve the ambiguity. This is not to say however that the 
door should be closed to new methods. The way forward is to go 
backwards. Back tracking could mean recollecting the data on a higher 
resolution instrument (ie. Peter Stephens), annealing the sample or 
trying some SEM/TEM analysis. If all this fails then it is really a 
matter of trying structure solution for each of the possible lattice 
parameters.

Excuse the long mail but while I am at it I would like to correct a 
misconception regarding the idea of an "exhaustive" search and in the 
process state the reason why I developed an indexing algorithm. I have 
heard the term "exhaustive" used so much in indexing that I am beginning 
to believe that I got something wrong. So please enlighten me if you can.

On data with small 2Th errors then a method can claim to be exhaustive. 
However, on data with large errors due to say peak overlap on a dominant 
zone problem then the term  "exhaustive" looses meaning. The successive 
dichotomy method, a stroke of genius by Daniel Louër to use it, is often 
regarded as being exhaustive. For data with large errors the delta-2Th 
values would need to be set large for the dichotomy method to proceed to 
the correct solution. If the delta-2Th were indeed set large enough then 
there would be many solution ranges returned (note I am defining a 
solution range as a solution with +- delta-2Th). Thus sure enough the 
solution range would be there but the correct range would be impossible 
to define. If the correct range could somehow be identified then in my 
view an iterative least squares estimate between the observed and 
calculated d-spacings is the best solution choice within a particular 
range. Note multiple Palwey/Le Bail fits would not be feasible if the 
delta-2Th were large; this brings me to my own algorithm (dare I say its 
Topas) which returns iterative least squares solutions. Now having said 
that no method is going to resolve ambiguity, it is my opinion that a 
combined ITO and DICVOL probably solves more than 90% of every thing 
thrown at it. Thus new algorithms are only filling a small gap and to 
find this gap is presumably what UPPW is all about - or is it? If not 
then it's a lot of fun in any case.     

cheers

alan


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/UIYolB/TM
---------------------------------------------------------------------~->

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/