next up previous contents
Next: 5.2.3 Parallel Solution Up: 5.2 Program Structure Previous: 5.2.1 Serial Initialization   Contents


5.2.2 Parallel Initialization

When all data have been delivered to their proper processors, the parallel data structures can be created and initialized (cf. Fig. 5.6). The parallel PETSc vectors for local magnetic fields, magnetic scalar potentials, etc. are created and the parallel solver for the Poisson problem is initialized.

Figure 5.6: Flow chart of the parallel initialization section.
\includegraphics[scale=0.5]{fig/talk/fig/fluss3.fig.eps}

Then the computation intensive part starts: First, the finite element mesh is analyzed thoroughly. The volumes of all elements and vertices (using the box scheme [40]), element quality factors, and the corresponding extrema and averages are calculated. Next, the stiffness matrix of the Poisson problem (Eq. (2.15)) and the matrices for the contributions to the local field (Eqs. (3.21), (3.28)) are calculated on an element by element basis. However, the results are directly inserted in the global matrices, where the contributions from different elements (even from different processors) are added up by PETSc automatically. This behavior makes the handling of the nodes of the finite element mesh very convenient, because we do not have to manage any duplicate nodes (``ghost points'').

Fig. 5.7 shows how PETSc distributes the data of matrices and vectors over the processors. For example, a simple $8\times 8$ matrix is split into four partitions for four processors. The first two rows are stored on the first processor, the next two rows on the second processor, another two rows on the third, and the last two rows on the fourth processor. Vectors are distributed in a similar way: Each processors holds those elements which correspond to nodes of the finite element mesh that have been assigned to that processor.

Figure 5.7: Matrix-vector multiplication with matrix and vector elements distributed over four processors.
\includegraphics[scale=0.8]{fig/dots0200651/p2/petscmat.eps}

The sparsity pattern of the stiffness matrix of the nanodot model on a single processor is shown on the left in Fig. 5.8. The band structure is achieved by a suitable numbering of the nodes, which has in this case been optimized by the mesh generator Patran using the Gibbs-Poole-Stockmeyer algorithm [69]. This can already be considered a type of ``mesh partitioning'' since it renumbers the nodes in such a way, that nodes with common edges are close to each other in the numbering scheme. Thus, after partitioning the band structure is only slightly disturbed as shown on the right in Fig. 5.8, where the sparsity pattern of the whole stiffness matrix on two processors is shown. The dashed line separates the parts of the first and second processor.

Figure 5.8: Sparsity pattern of the stiffness matrix of the nanodot model for a single processor (left) and distributed over two processors (right).
\includegraphics[scale=0.4]{fig/dots0200651/p1/m12.eps}      \includegraphics[scale=0.561]{fig/dots0200651/p2/matsplit.eps}

Then, the boundary solid angles $S(\boldsymbol{x})$ are calculated which are subsequently used during the calculation of the boundary matrix (cf. Eq. (3.47)). In each element the solid angles subtended by the face opposite each node $v_i$ (Fig. 5.9) are calculated as

\begin{displaymath}
\omega(v_i)=\alpha+\beta+\gamma-\pi \quad,
\end{displaymath} (5.1)

where $\alpha$, $\beta$, and $\gamma$ denote the dihedral angles between each pair of faces, which share the node $v_i$. These dihedral angles are calculated using the face normals (e.g. $\boldsymbol{n}_a$ and $\boldsymbol{n}_b$) as
\begin{displaymath}
\alpha=\pi-\arccos(\boldsymbol{n}_a \cdot \boldsymbol{n}_b)
\end{displaymath} (5.2)

The contributions from all elements sharing a common node are summed up and finally give the required boundary solid angle $S(\boldsymbol{x})$. If this calculation is done for all nodes (including interior ones), the result can be checked against the boundary indicators obtained during the serial initialization, because the solid angle for all interior nodes has to be $4\pi$, naturally.

Figure 5.9: Solid angle of a trihedral angle made up by three faces of a tetrahedron.
\includegraphics[scale=0.7]{fig/fem/solidangle.eps}

For the data output along a sampling line through the model and for the graphics output the interpolation matrices are calculated (cf. Sec. 6.2.2). Then, the requested solver (PVODE for integration of the Landau-Lifshitz-Gilbert equation, TAO for energy minimization, or the solver for the nudged elastic band method) is initialized and all additionally required data structures are created.

The initialization phase is finally completed with the calculation of the local fields and energies of the initial magnetization distribution, first entries in the log files, and the removal of data structures, which are not needed any more.


next up previous contents
Next: 5.2.3 Parallel Solution Up: 5.2 Program Structure Previous: 5.2.1 Serial Initialization   Contents
Werner Scholz 2003-06-08