5.2.2 Parallel Initialization

When all data have been delivered to their proper processors, the parallel data structures can be created and initialized (cf. Fig. 5.6). The parallel PETSc vectors for local magnetic fields, magnetic scalar potentials, etc. are created and the parallel solver for the Poisson problem is initialized.

**Figure 5.6:** Flow chart of the parallel initialization section.
$\includegraphics[scale=0.5]{fig/talk/fig/fluss3.fig.eps}$

Then the computation intensive part starts: First, the finite element mesh is analyzed thoroughly. The volumes of all elements and vertices (using the box scheme [40]), element quality factors, and the corresponding extrema and averages are calculated. Next, the stiffness matrix of the Poisson problem (Eq. (2.15)) and the matrices for the contributions to the local field (Eqs. (3.21), (3.28)) are calculated on an element by element basis. However, the results are directly inserted in the global matrices, where the contributions from different elements (even from different processors) are added up by PETSc automatically. This behavior makes the handling of the nodes of the finite element mesh very convenient, because we do not have to manage any duplicate nodes (``ghost points'').

Fig. 5.7 shows how PETSc distributes the data of matrices and vectors over the processors. For example, a simple $8\times 8$ matrix is split into four partitions for four processors. The first two rows are stored on the first processor, the next two rows on the second processor, another two rows on the third, and the last two rows on the fourth processor. Vectors are distributed in a similar way: Each processors holds those elements which correspond to nodes of the finite element mesh that have been assigned to that processor.

**Figure 5.7:** Matrix-vector multiplication with matrix and vector elements distributed over four processors.
$\includegraphics[scale=0.8]{fig/dots0200651/p2/petscmat.eps}$

The sparsity pattern of the stiffness matrix of the nanodot model on a single processor is shown on the left in Fig. 5.8. The band structure is achieved by a suitable numbering of the nodes, which has in this case been optimized by the mesh generator Patran using the Gibbs-Poole-Stockmeyer algorithm [69]. This can already be considered a type of ``mesh partitioning'' since it renumbers the nodes in such a way, that nodes with common edges are close to each other in the numbering scheme. Thus, after partitioning the band structure is only slightly disturbed as shown on the right in Fig. 5.8, where the sparsity pattern of the whole stiffness matrix on two processors is shown. The dashed line separates the parts of the first and second processor.

**Figure 5.8:** Sparsity pattern of the stiffness matrix of the nanodot model for a single processor (left) and distributed over two processors (right).
$\includegraphics[scale=0.4]{fig/dots0200651/p1/m12.eps}$ $\includegraphics[scale=0.561]{fig/dots0200651/p2/matsplit.eps}$

Then, the boundary solid angles $S(\boldsymbol{x})$ are calculated which are subsequently used during the calculation of the boundary matrix (cf. Eq. (3.47)). In each element the solid angles subtended by the face opposite each node

(Fig. 5.9) are calculated as

$\begin{displaymath} \omega(v_i)=\alpha+\beta+\gamma-\pi \quad, \end{displaymath}$

(5.1)

$\begin{displaymath} \alpha=\pi-\arccos(\boldsymbol{n}_a \cdot \boldsymbol{n}_b) \end{displaymath}$

(5.2)

**Figure 5.9:** Solid angle of a trihedral angle made up by three faces of a tetrahedron.
$\includegraphics[scale=0.7]{fig/fem/solidangle.eps}$

For the data output along a sampling line through the model and for the graphics output the interpolation matrices are calculated (cf. Sec. 6.2.2). Then, the requested solver (PVODE for integration of the Landau-Lifshitz-Gilbert equation, TAO for energy minimization, or the solver for the nudged elastic band method) is initialized and all additionally required data structures are created.

The initialization phase is finally completed with the calculation of the local fields and energies of the initial magnetization distribution, first entries in the log files, and the removal of data structures, which are not needed any more.