D. Molecule location and other methods

- Internet tools -

 

Part41a.html

When standard methods fail (Patterson and direct methods), the alternative is to build a model and to locate it in the cell. This could be either a simple or a formidable task, depending of the existence or not of prior informations. Locating fragments of known geometry was long ago used in single crystal studies of organic compounds. Prior information may consist in the whole connectivity scheme of a molecule from NMR data. On the other hand, inorganic phases allow also some guess when the basic structural units are known or obvious (in case of presence of polyhedra like tetrahedra, octahedra...) and moreover if the connectivity of these units is evident (for instance exclusive corner sharing of [SiO4] tetrahedra in compounds having basically SiO2 formula, like in zeolites).

The last ten years of SDPD were particularly innovative in this field. No less than 20 new methods or transposition of existing (single crystal or molecular modelling) methods recently emerged for solving structures from powder data. Some methods have evolved and improved over the years. This sub-topic is quite hot because pharmaceutical or other economically important compounds are potentially involved. Is model building/locating specific to powder diffraction ? Absolutely not, one could apply these methods also to single crystal data set, of course. A total of 51 cases were found in this category in the SDPD-Database, since 1988. A tendency to expansion is obvious, with 12 cases in 1997. The basic difficulty in these approaches is in fact to define the starting model ; most of those model-location methods have a role to play only when this difficulty has been solved. We are between prior chemical knowledge and full model prediction.

1- In pioneering works, models were guessed (more or less) from convergent evidences. Once models are built, they can be optimized by various means like the distance least squares program (DLS-76) or some energy minimization tools. It is not always easy to understand how the models were built before being optimized in some works classified in this Model Building category. Packing considerations on the heavy atoms were used to solve the structure of two lanthanum palladium oxides. Many of the subsequent approaches of this kind cannot be easily summarized, there is for example a metal-substituted aluminum phosphate catalyst which was "solved on the basis of evidence garnered from high-resolution electron microscopy, electron diffraction, and other methods including energy minimization"..."A consideration of possible frameworks of appropriate dimensions and with the observed size and spatial distribution of unidimensional, large-pore channels suggests a trial structure with idealized symmetry Cmcm". In this case, DLS and METAPOCS programs were used for the model optimization. Numerous other SDPD succeeded by this kind of model building, for which the optimization of the starting models was derived either from the above programs or from MNDO calculations, or CERIUS, or THEO, or INSIGHT-II.

2- Model location without energy minimization. When an initial model is selected, which should be sufficiently large for leading to calculated structure factors or Patterson map or phases evaluation, then the only problem is to locate this initial model into the cell. The PATSEE program is an old SHELX companion. It was applied at least 2 times successfully to powder diffraction data. The program requires extracted structure factors and attempts to combine the merits of both Patterson and direct methods in order to position a fragment of known geometry in the unit cell. Random rotations may concern one fragment and the program allows one torsional degree of freedom. The random translation search may locate up to two independent fragments of any size. Early works in this model-location category consisted in the brute force : searching for a molecular position and orientation following a systematic grid search, as in the cases of solid CF3I, CFCl3, or RS-camphor. Sometimes, retaining a proposition before trying a Rietveld refinement was based on interatomic distances criteria using packing considerations. The ROTSEARCH program allows rotation and translation Patterson searches, checked against extracted intensities. It was applied to the solution of zeolite and organic compounds. A recent version (ROTS96) was able to locate 3 independent molecules in C16H22N6. P-RISCON is a real-space scavenger program capable of setting an initial rigid model by refining the fractional coordinates of its center of mass and its angular orientation, using extracted intensities for checking. A search for the model position may also be found in the structure determination of C60Br24(Br2)2. More sophisticated approaches in optimization of known fragment location include simulated annealing or the related Monte Carlo algorithm. The Monte Carlo efficiency was demonstrated by the determination of known as well as unknown structures using programs assessing the suitability of the model location on the basis of the agreement with the experimental diffraction data (no need to extract structure factors). Variants have incorporated restrained relaxation of the molecular geometry or a high degree of molecular flexibility or were said to be generalized (OCTOPUS96 and OCTOPUS97 programs). Do not forget that the model has to be known when dealing with these methods. Even more sophisticated, maybe, are methods applying simulated annealing, possibly through a genetic algorithm, because a set of internal coordinates defines the geometry of the molecule (the conformation), allowing torsional angles to vary (in addition to the usual external degree of freedom defining position and orientation). The use of Genetic Algorithm was developed independently by two teams and applied either to already known or unknown compounds. Testing the agreement between the postulated structure and the experimental diffraction data was either on extracted structure factors (using the Pawley method) or on the full powder pattern.

3- Model location from crystal packing considerations. Computational methods, which predict possible crystal structures on the basis of the molecular structures, were applied to powder data. Even more, the A modification of 4-amidinoindanone guanylhydrazone structure could be determine without knowing the lattice parameters and crystal system. For a packing-based structure determination, indexing is not essential, however it is very useful in reducing the amount of computations by restricting the packing to a limited number of space groups. Nevertheless, the powder pattern remains essential for a confirmation of the model, of course. An inconvenience of this method is that the whole molecular structure should be known otherwise packing consideration would not apply. In this category, but using the prior knowledge of cell dimension and spacegroup, may be classified various methods for computer prediction of molecule location (still without the need of intensities), by packing energy calculation ; computational chemistry techniques ; minimization of the crystal-lattice potential energy calculated with semi-empirical atom-atom potentials using the PMC program. Finally, the systematic ranking of all potential packing arrangements on the basis of lattice energies was shown to be efficient on already known samples .

4- Ab initio prediction. Most experts believe that crystal structure prediction from a given combination of elements is still a faraway goal. Easier, prediction by simulated annealing starting from unit cell dimensions and content has progressed for simple systems. The more complex structures are predicted when the prior knowledge of symmetry is added.

To be classified as hybrid approaches or new concepts are the combination of chemical information and powder diffraction data in an automated structure determination procedure for zeolites (Fourier recycling with a specialized topology search) and the use of a periodic nodal surface calculated from a few strong, low-index reflections to facilitate structure solution. A summary of these 51 applications of model building-location methods to unknown structure is shown in this figure.

The dominant trend is given by Monte Carlo and Patterson-search methods. These programs for molecule location are seldom in the public domain. However, they are quite recent and could be methods for the future. Most of them are specifically designed for organic compounds and they cannot be applied if at least a large part of the structure is not already known. Trends are in adding flexibility to the starting models. The use of genetic algorithm appears currently to be the most sophisticated approach, giving freedom to non-rigid molecule parts. Anyway, the impact of these methods on the routine analysis is small. Some methods were applied to the solution of already known and still not to any unknown one. Each of the above method is claimed to possess advantages that are considered as inconveniences by others. For instance : avoiding structure factor extraction is times to times considered as being an advantage because the use of the full pattern overcomes overlapping problems ; working on extracted structure factors is considered as an advantage for speed ; those predicting the position of molecules without the need of intensities at all find this an advantage (although few people will trust such results alone) ; those working on molecule packing underline that they even do not need the cell parameters (anyway, they always make a Rietveld final refinement). Finally, some current sophisticated model location approaches are no more than the old good "trial and error" process, modernized for making high numbers of trials in small times.
 
 

About the complexity of problems solved by model location, here is shown the number of independent molecules or fragments simultaneously located. Only 26 unknown structures were determined since 5 years, corresponding to this criterion, so that statistics are hardly possible. Other possible complexity criteria for model building could be the number of internal degrees of freedom (actually the maximum is 24), or the number of external degrees of freedom (the maximum is actually 15) and the specific number of torsion angles, the maximum being 10, up to now, and to my knowledge.

IUCr is reluctant to publish single crystal data when the ratio of the reflection number on the number of refined parameters is less than 10. A kind of limit has been attained recently for some organic compounds of which the structures were determined by molecule location methods, with ratios lower than 2 or 3. The trend is to use geometrical restraints in the final refinement. Anyway, one could doubt about some details of the structure when this ratio is not much larger than 1.

A cleavage is now apparent between the unknown structures for which it will be possible to guess a molecule or a sufficiently large fragment (or several ones) and those for which no sufficient prior information will be available. In the latter case, the ab initio methods will continue to be applied (that is to say, the Patterson and direct methods). Some times ago, it was stated that we were unable to determine structures as large as those we could refine by the Rietveld method. The new paradox is that we can locate now molecules in much bigger cells than we could refine without constraints. A few softwares, free for academic research, dominate each step of the SDPD whole process. Not all softwares are in the public domain and some methods are the exclusivity of their developers. SDPD will not expand faster before a larger distribution of these new softwares. Only PATSEE, DIRDIF and PROMET sems to be easily available.

One participant of the Structure Determination by Powder Diffractometry Round Robin solved the structure of the pharmaceutical compound by the Global Optimization Method. A model for the molecule was taken from the tetracycline hydrate in the Cambridge Structural Database, removing the water. The tetracycline fragment as well as the Cl atom were positioned at random in the cell and an optimum position was searched by simulated annealing using the DRUID program against the 100 first structure factors extracted by the Pawley method from the synchrotron data. The final Rietveld refinement plot is shown here. And the position of the fragment before refinement is compared to its position after refinement.

Now, some words about the accuracy of the coordinates from participant 2 and 4. A very small single crystal was taken from the powder with size forty-per-thirty-per-twenty micrometers and data were recorded at the Daresbury synchrotron facility on the 9.8 station. The structure could be refined without any constraint, including the hydrogen atoms. Participant 2 has provided the most accurate results with mean displacements relative to the single crystal data lower by a factor 2 than those from participant 4 and from the organisers. We don't know exactly why. Even the Hydrogen atom positions are well located with a mean error of 0.2 angstroms. Here is an ORTEP view of the tetracycline hydrochloride structure. Only one hydrogen atom was not located by participant 2 : this one.

The software of the SDPD Round Robin participant 2 should be distributed to the participants of the Glasgow workshop about structure determination from powder data. Some other new software were made available recently in this category of molecule location. This is the case of POWDER SOLVE a commercial program associated with CERIUS.

But, when no fragment is previously known and if the classical approach (Patterson and direct methods) fails, then the crystallographer does not dispose of so many tools. Independent translation of dominant X-ray scatterers through the unit cell were attempted, for instance by Monte Carlo (with up to 2 different atoms) as well as by systematic grid search. Recent efforts in order to build larger models from scratch were done and implemented in several available software: FULLPROF able for instance to locate Pb in PbSO4 by Monte Carlo, ENDEAVOUR working by simulated annealing and/or using potential functions, and ESPOIR using a reverse Monte Carlo approach applied on extracted "|Fobs|". This last possibility was claimed to be impossible by the Reverse Monte Carlo main author Robert McGreevy declaring : "RMC modelling will certainly not enable ab initio crystal structure models to be obtained starting from random initial structure". On the contrary, ESPOIR shows that this is possible by using the pertinent strategy. Indeed, structures with up to 15-30 independent atoms can be solved from scratch. The structure of the SDPD Round Robin sample I, a cobalt amine, for which no participant proposed a model, can be solved by ESPOIR (15 independent non-hydrogen atoms in P21 space group). Essentially, the (not so clever) strategy consists in trying again and again, jumping quickly to a new starting configuration if a model is frozen (false minima). Then, it is understandable that the main problem of those programs, when dealing with the more complex cases, is computer time. The direct methods find 30-100 independent atoms (though 50 were never attained from powder diffractometry data till now) in a matter of minutes on a PC (100-500MHz), and less than 30 atoms in a matter of seconds. The millions of moves and atom permutations, necessary for finding 30 atoms with ESPOIR from a set of 500 hkl reflections, require one night, at least, if you are lucky. So that, testing for larger configurations was not already done, due to lack of easy access to faster computer. Fortunately, the Moore's law is still expected to be applicable for many years, so that some hope may be placed on ESPOIR and the other above mentioned software. As a matter of fact, "espoir" (in french) means "hope" in english, suggesting that you should not lose it. Moreover, the source code (Fortran) is delivered with the package, allowing you to add your own stones to the building.

Now, a small demonstration of the ESPOIR possibilities. First in the case of the location of an octahedron in a cell. Then for some small structures.
 
 

Part42a.html

I don't know exactly how many of you have access to the Internet in China. I am sure that some do have, since they were able to give access to their own software by this modern and efficient mean. I will give you a list of the main entry points for crystallography, and discuss about Newsgroups and Mailing lists.

Part421.html

In my opinion, the main entry point is that of the International Union of Crystallography, giving access to a huge quantity of information, either at the site itself or by hypertext links to other Web pages. First of all, when going at this Web site, have a look at the What's new page, updated almost daily by Howard Flack. All may be found there concerning databases , teaching, Sincris which is more dedicated to software, and here is the full software list with hypertext links. All is found about crystallographic servers in the World, dates of forthcoming conferences, radiation resources, proceedings on past meetings, journals with the possibility of search by keywords in IUCr Journals, sources of News.

More specifically dedicated to powder diffraction, the Collaborative Computational Project CCP14 has recently enormously increased its field of activity, not only to be a priviledged center of information about softwares for single crystal and powder diffraction, but also you may find at this site many tutorials and goodies. The secretary is currently Lachlan Cranswick. You may obtain the whole CCP14 content and more on a CD ROM made available especially for those not having Web access. I have 10 of those CDs at your disposal. I will just ask to those taking one CD to put their name and address, including E-mail, on the registry page, even if it is in chinese, no matter.

More specifically on structure determination from powder diffraction data, you may find the SDPD-Database, containing a compendium of discussed references and many goodies.

Search engines are an excellent entry when you are searching by keywords on the Web. Here is the result of an Altavista search with the keyword "SDPD", returning more than 400 Web pages.
 
 

Part422a.html

Newsgroups are not enormously used by crystallographers, though they should be, in my opinion. Here is the sci.texhniques.xtallography Newsgroup on which you will find generally less than one message per day.

Part423a.html

Mailing lists have more clients, some like the Rietveld mailing list have more than 300 subscribers and the number of mails, as a mean is near of 2 per day. Archives are available, gathering almost 2000 mails sine 5 years. The SDPD mailing list is a very new one, having currently 160 subscribers and less than one mail per day.
 
 

I hope that this very short overview of the Internet possibilities will encourage you to contribute to a world-class discussion in crystallography and particularly in powder diffraction topics.

Thank you for your attention.