\documentclass[reqno]{amsart} \usepackage{hyperref} \AtBeginDocument{{\noindent\small \emph{Electronic Journal of Differential Equations}, Vol. 2014 (2014), No. 79, pp. 1--7.\newline ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu \newline ftp ejde.math.txstate.edu} \thanks{\copyright 2014 Texas State University - San Marcos.} \vspace{9mm}} \begin{document} \title[\hfilneg EJDE-2014/79\hfil Explicit expressions] {Explicit expressions for the matrix exponential function obtained by means of an algebraic convolution formula} \author[J. R. Mandujano, L. Verde-Star \hfil EJDE-2014/79\hfilneg] {Jos\'e Roberto Mandujano, Luis Verde-Star} % in alphabetical order \address{Jos\'e Roberto Mandujano \newline Escuela Superior de C\'omputo, Instituto Polit\'ecnico Nacional, U. Zacatenco, M\'exico D.F., M\'exico} \email{jrmandujano@yahoo.com.mx} \address{Luis Verde-Star \newline Department of Mathematics, Universidad Aut\'onoma Metropolitana, Iztapalapa, \newline Apartado 55-534, M\'exico D.F. 09340, M\'exico} \email{verde@xanum.uam.mx} \thanks{Submitted June 20, 2013. Published March 21, 2014.} \subjclass[2000]{34A30, 65F60, 15A16} \keywords{Matrix exponential; dynamic solutions; explicit formula; \hfill\break\indent systems of linear differential equations} \begin{abstract} We present a direct algebraic method for obtaining the matrix exponential function $\exp(tA)$, expressed as an explicit function of $t$ for any square matrix $A$ whose eigenvalues are known. The explicit form can be used to determine how a given eigenvalue affects the behavior of $\exp(tA)$. We use an algebraic convolution formula for basic exponential polynomials to obtain the dynamic solution $g(t)$ associated with the characteristic (or minimal) polynomial $w(x)$ of $A$. Then $\exp(tA)$ is expressed as $\sum_k g_k(t) w_k(A)$, where the $g_k(t)$ are successive derivatives of $g$ and the $w_k$ are the Horner polynomials associated with $w(x)$. Our method also gives a number $\delta$ that measures how well the computed approximation satisfies the matrix differential equation $F'(tA)=A F(tA)$ at some given point $t=\beta$. Some numerical experiments with random matrices indicate that the proposed method can be applied to matrices of order up to 40. \end{abstract} \maketitle \numberwithin{equation}{section} \newtheorem{theorem}{Theorem}[section] \allowdisplaybreaks \section{Introduction} The importance of the function ${\mathrm e}^{tA}$, where $A$ is a square matrix and $t$ is a real or complex variable, is well-known. For example, many problems in areas such as Control and Systems Theory, Physics, Biomathematics, Nuclear Sciences, etc., require the solution of a linear system of first-order differential equations with constant coefficients and given initial conditions. Such solution has the form ${\mathrm e}^{tA}C$, where $C$ is a constant vector. The behavior of ${\mathrm e}^{tA}$ as a function of $t$ depends in a complex way on the eigenvalues and the entries of $A$. In this article we present a method that gives an explicit formula for ${\mathrm e}^{tA}$ in terms of linear combinations of functions of the form $t^k \exp(\lambda t)$. Consequently, we can easily compute ${\mathrm e}^{tA} $ (and also ${\mathrm e}^{tA}C $) for a large number of values of $t$. The explicit expressions can also be used to determine the influence of each eigenvalue in the behavior of the matrix exponential function, and the effect of small perturbations of the eigenvalues. We can also study how the distribution of the eigenvalues in the complex plane determines how accurately ${\mathrm e}^{tA}$ can be computed in some given region. Our method is a direct algebraic construction of ${\mathrm e}^{tA}$ using the entries of $A$ and its eigenvalues. It does not use numerical integration, integral transforms, Pad\'e approximations, orthogonal polynomial series, nor differential equations solvers. We use an algebraic convolution formula for exponential polynomials which is based on the multiplication of basic rational functions. Our approach is similar to the one used by Luther and Rost \cite{LR}, but we do not need the inversion of confluent Vandermonde matrices. A straightforward application of our method yields all the explicit formulas obtained by Wu in \cite{Wu} and many other formulas for matrices with a small number of distinct eigenvalues. As a consequence of the Cayley-Hamilton theorem, all the powers $A^m$ of a square matrix $A$ can be expressed in terms of the initial powers $I, A,\ldots A^n$, for some fixed $n$. This fact can be used to show that functions of the form $f(tA)$ may be expressed as polynomials in $A$ with coefficients that depend on $t$. See \cite{FofM}, where this approach is used to study matrix functions $f(tA)$ where $f$ is given by a power series. In particular, for the exponential function we obtained in \cite{FofM} the formula \begin{equation} {\mathrm e}^{tA}= \sum_{k=0}^n g_k(t) w_k(A), \label{e1.1} \end{equation} where $A$ is a square matrix, the $w_k$ are the Horner polynomials associated with a polynomial $w(x)$ of degree $n+1$ such that $w(A)=0$, and the $g_k(t)$ are exponential polynomials that depend only on the eigenvalues of $A$. Note that $w(x)$ may be the minimal or the characteristic polynomial of $A$. Note also that formula \eqref{e1.1} holds when $A$ is replaced by any matrix $B$ that is similar to $A$, with the same coefficient functions $g_k(t)$. The set $\{ w_k(x) : 0 \le k \le n \}$ is often called the {\it control basis} associated with $w$. The function $g_n(t)$, called the {\it dynamic solution}, is the solution of the scalar differential equation $w(D)y(t)=0$ with initial conditions $D^k g_n(0)=0$ for $k=0,1,2,\ldots, n-1$ and $D^ng_n(0)=1.$ The other coefficient functions $g_k(t)$ are obtained by repeated differentiation of $g_n(t)$, that is, $g_{n-k}(t)=D^kg_n(t)$ for $0 \le k \le n$. In the present paper we compute the dynamic solution directly from the roots of $w$ using an algebraic formula for the convolution of basic exponential polynomials. If the number of distinct roots and nonzero coefficients of the polynomial $w$ is small, then the numerical errors in the computation are negligible. This is the case of the polynomials considered by Wu in \cite{Wu} to obtain explicit formulas. The use of the dynamic solution to solve matrix differential equations was introduced in \cite{ruiz2}. For an algebraic approach to convolutions see \cite{aactm}. If the number of distinct eigenvalues is relatively large then there are some sources of numerical errors. The quality of the computed function $f(tA)$ as an approximation of $ {\mathrm e}^{tA}$ can be determined by measuring how well it satisfies the differential equation $f' (tA)= A f(tA)$. This is easily done and requires only one additional derivative of $g_n$ and the computation of $f' (tA)$, which is analogous to \eqref{e1.1}. It is well-known that the computation of the matrix exponential is a difficult numerical problem. See \cite{moler} and \cite{Hi}. Our numerical experiments show that our method works well for matrices $A$ of moderate size, whose characteristic values are known with sufficient accuracy, regardless of the norm of $A$ and the multiplicities of the eigenvalues. In order to determine the accuracy of the computations with the explicit expressions produced by our algorithm we performed some numerical experiments with random matrices of order up to $40$, using a Maple program. Since our method is essentially equivalent to Hermite interpolation of the exponential function at the spectrum of $A$, the accuracy of the results depend on a complex way on the distribution of the eigenvalues in the complex plane. It is important to notice that the main objective of our method is to obtain an explicit expression for $ {\mathrm e}^{tA}$, where $t$ is a variable. Our method can not be directly compared to algorithms of a different nature that compute ${\mathrm e}^{A}$ for a given $A$, such as the ones based on scaling and squaring, or Pad\'e approximations, which are designed to obtain high accuracy or to handle large matrices for certain restricted sets of matrices. \section{Notation and basic results} In this section we introduce notation that we use in the rest of the paper and present some basic results taken from \cite{DDCI} and \cite{FofM}, where the proofs and more detailed discussion can be found. Let $n$ be a nonnegative integer and let $w(z)=z^{n+1}+b_1 z^n+ \cdots +b_{n+1} $ be a monic polynomial of degree $n+1$ with complex coefficients. The sequence $\{w_k\}$ of {\sl Horner polynomials} associated with $w$ is defined by \begin{equation} w_k(z)=z^k+b_1 z^{k-1}+ b_2 z^{k-2}+\cdots +b_k, \quad k\ge 0, \label{e2.1} \end{equation} where $b_0=1$ and $b_j=0$ for $j>n+1$. Note that $\{w_k: k \ge 0\}$ is a basis for the vector space of all the polynomials and $\{w_0,w_1,\ldots ,w_n \} $ is a basis for the subspace $\mathcal{P}_n$ of all polynomials with degree at most $n$. This basis is often called the control basis. Note also that $w_{n+1}=w$, $w_{n+1+k}(z)=z^kw(z)$ for $k\geq 0$, and \begin{equation} w_{k+1}(z)=zw_k(z)+b_{k+1}, \quad k\ge 0. \label{e2.2} \end{equation} In \cite{DDCI} we proved the following result, that we call the general interpolation formula. \begin{theorem} \label{thm2.1} Let $w$ be as previously defined, let $f$ be a function defined on the multiset of roots of $w$, and let $p(x)$ be the polynomial in $\mathcal{P}_n$ that interpolates $f$ at the roots of $w$. Then \begin{equation} p(x) = \Delta_w \left( f(z) w[z,x] \right), \label{e2.3} \end{equation} where $\Delta_w$ is the divided difference functional with respect to the roots of $w$ and acts with respect to $z$, and $w[z,x]$ is the difference quotient $$ w[z,x] = \frac{w(z) - w(x)}{z-x} . $$ \end{theorem} A simple computation yields \begin{equation} w[z,x]= \sum_{k=0}^n w_k(x) z^{n-k}. \label{e2.4} \end{equation} For $f$ as in the previous theorem and a parameter $t$, define \begin{equation} g_k(t) = \Delta_{w(z)} \left( z^{n-k} f(tz) \right), \quad 0 \le k \le n. \label{e2.5} \end{equation} Now let $A$ be any square matrix such that $w(A)=0$. Then we have \begin{equation} f(t A ) = \Delta_{w(z)} \big( f(tz) w[z,A] \big)= \sum_{k=0}^n g_k(t) w_k(A) . \label{e2.6} \end{equation} In the simple case in which $w$ has $n+1$ distinct roots $\lambda_0, \lambda_1, \ldots,\lambda_n$ we have \begin{equation} g_k(t) = \sum_{j=0}^n \frac{\lambda_j^{n-k} f(t \lambda_j)}{w' (\lambda_j)}, \quad 0 \le k \le n. \label{e2.7} \end{equation} If $f(x)=\exp(x)$ then each $g_k(t)$ is an exponential polynomial. In the general case we have \begin{equation} w(z)= \prod_{j=0}^r ( z- \lambda_j)^{m_j +1}, \label{e2.8} \end{equation} where the $\lambda_j$ are distinct complex numbers, the $m_j$ are nonnegative integers, and $\sum_j (m_j +1) = n+1.$ Let us define the basic exponential polynomials \begin{equation} \tilde{f}_{x,k}(t)=\frac{t^k}{k!}{\mathrm e}^{xt}, \quad x \in \mathbb{C}, \; k \in \mathbb{N}. \label{e2.9} \end{equation} In \cite{FofM} we proved that the dynamic solution $g_n$ associated with $w$ is given by \begin{equation} g_n= \tilde{f}_{\lambda_0,m_0} * \tilde{f}_{\lambda_1,m_1 } * \cdots * \tilde{f}_{\lambda_r,m_r}, \label{e2.10} \end{equation} $g_{k-1}=g_k'$ for $ 1 \le k \le n$, and the convolution of two basic exponential polynomials is given by \begin{equation} \tilde{f}_{x,k} * \tilde{f}_{x,m}=\tilde{f}_{x,k+m+1},\quad x \in \mathbb{C}, \; k,m \in \mathbb{N}, \end{equation} and \begin{equation} \begin{aligned} \tilde{f}_{y,k} * \tilde{f}_{x,m} &= (-1)^{k+1}\sum_{i=0}^m \binom{k+i}{i} \frac{{\tilde f}_{x,m-i}}{(y-x)^{k+i+1}} \\ &\quad +(-1)^{m+1}\sum_{j=0}^k \binom{m+j}{j} \frac{{\tilde f}_{y,k-j}}{(x-y)^{m+j+1}}, \quad x \ne y. \end{aligned}\label{e2.11} \end{equation} This convolution product coincides with the Duhamel convolution, usually defined by means of integrals, when it is applied to exponential polynomials. \section{Proposed algorithm} Let $A$ be a given square matrix, let $w$ be a monic polynomial such that $w(A)=0$, let $\lambda_0,\lambda_1, \lambda_2, \ldots, \lambda_r$ be the distinct roots of $w$, and let $m_j + 1$ be the multiplicity of $\lambda_j$ for $0 \le j \le r$. \noindent\textbf{Step 1.} Compute the dynamic solution $g_n$, defined by \eqref{e2.10}, by repeated application of the convolution formula \eqref{e2.11}. In this way we obtain $g_n$ in the form $$ g_n = \sum_{j=0}^r \sum_{k=0}^{m_j} \alpha_{j,k} \ \tilde{f}_{\lambda_j,k }, $$ where the coefficients $\alpha_{j,k}$ are numbers. Using \eqref{e2.9} we get $g_n(t)$ as a linear combination of the basic exponential polynomials associated with the roots of $w$. \noindent\textbf{Step 2.} Obtain the functions $g_k$ by repeated differentiation of $g_n$; that is, $g_{k-1}= g_k'$ for $k = n ,n-1,n-2, \ldots, 1$. \noindent\textbf{Step 3.} Find the matrices $w_k(A)$ using Horner's recurrence relation \eqref{e2.2}. \noindent\textbf{Step 4.} Substitute the scalar functions $g_k(t)$ and the matrices $w_k(A)$ in formula \eqref{e2.6} to obtain $\exp(tA)$. Note that computing the $w_k(A)$ is the step with the largest computational cost, since it requires $n-1$ multiplications by the matrix $A$. Note also that this step is independent of steps 1 and 2. If we want to compute $\exp(tA) C$, where $C$ is a vector, then we can compute the vectors $w_k(A)C$ and this requires $n$ matrix-vector multiplications. Once we have the explicit expression for $\exp(tA)$, to find $\exp(\beta A)$ for a given $\beta$, it is sufficient to compute the $n+1$ scalar functions $g_0(\beta), g_1(\beta), \ldots,g_n(\beta)$, and then use \eqref{e2.6}, which reduces to the computation of a linear combination of the already known matrices $w_k(A)$. Notice that there are no matrix multiplications involved here. It is clear that we can get any entry of $\exp(tA)$ as an explicit exponential polynomial. An important property of our algorithm is that it also provides a simple way to estimate the relative error in the computation of $\exp(tA)$ at a point $t=\beta$. Note first that $D_t \exp(tA)= A \exp(tA)$, where $D_t$ denotes differentiation with respect to $t$. Therefore $\exp(tA)$ satisfies the matrix equation \begin{equation} A = \exp(-tA ) D_t \exp(tA), \quad t \in {\mathbb R} \label{e3.1} \end{equation} Let $F(t )$ denote the computed right-hand side of \eqref{e2.6}. We measure how well $F(t)$ satisfies \eqref{e3.1} in the following way. By differentiation of \eqref{e2.6} with respect to $t$ we get \begin{equation} F' (t) = g_0' (t) I + \sum_{k=1}^n g_{k-1}(t) w_k(A) . \label{e3.2} \end{equation} For a given number $\beta$ let $B= F(-\beta) F' (\beta) $ and define \begin{equation} \delta= \frac{\| B -A \|}{ \| A \| }, \label{e3.3} \end{equation} where $ \| A \|$ denotes a suitable matrix norm, such as the infinity norm. Then the number $\delta$ measures how well $F(t)$ satisfies \eqref{e3.1} at $t=\beta$. The numerical experiments show that $\delta$ is a good approximation of the ``true'' relative error \begin{equation} \mu= \frac{\| F(\beta ) - E(bA) \| }{\| E(bA) \| }, \label{e3.4} \end{equation} where $E(bA)$ is produced by the Maple command ``MatrixExponential$(\beta A )$'', using a precision of 100 digits. In most cases we obtain $\delta \ge \mu$. It is important to detect multiple roots correctly and not deal with them as if they were distinct roots very close to each other, since that would introduce relatively large errors, due to the denominators in \eqref{e2.11}, which are powers of differences of eigenvalues. \section{Numerical results} The computational cost of the proposed algorithm increases with the size of the matrix $A$, since, in general, the computation of the matrices $w_k(A) $ for $0 \le k \le n$ requires $n-1$ multiplications by $A$, and $w$ is typically the characteristic polynomial of $A$. Therefore for large matrices $A$ we ca not expect to get high accuracy in the matrices computed with the explicit formula produced by our algorithm, unless $A$ is a sparse matrix and $w$ has a small number of suitably distributed distinct roots. We performed some numerical experiments to determine the range of values of the order of $A$ for which the algorithm produces acceptable results. We tested our algorithm taking $A$ as a random matrix of order $n$ for a few values of $n$ between 20 and 40, and using different parameters for the random matrix generator which yield different distributions of the eigenvalues. \begin{table}[ht] \caption{Some numerical results} \begin{center} \begin{tabular}{|c|c|c|c|c|c|c|c|c|} \hline $n$ & $D$ & $a$ & $b$ & $\|A\|$ & $\rho$ & $\|\exp(A)\|$ & $\delta$ & $\mu $ \\ \hline 20 & 50 & -4 & 2 & 10.7875 & 4.99504 & 13.6899 & 2.54043e-45 & 2.48411e-45\\ \hline 20 & 50 & -2 & 4 & 9.58886 & 4.70923 & 173.594 & 1.80540e-39 & 1.17495e-39\\ \hline 25 & 50 & -4 & 2 & 13.6732 & 6.43704 & 29.7808 & 7.33657e-44 & 5.09239e-44\\ \hline 25 & 50 & -2 & 4 & 13.7619 & 6.40041 & 982.736 & 1.31585e-34 & 8.66711e-35\\ \hline 30 & 60 & -4 & 2 & 15.5971 & 8.29544 & 46.79 & 2.51331e-52 & 2.05524e-52\\ \hline 30 & 60 & -2 & 4 & 14.7011 & 7.10494 & 1894.19 & 4.09793e-40 & 2.72607e-40\\ \hline 35 & 64 & -4 & 2 & 17.0037 & 9.35093 & 48.1121 & 9.91921e-55 & 6.16559e-55\\ \hline 35 & 64 & -2 & 4 & 17.7606 & 8.70811 & 8599.8 & 8.54165e-39 & 6.12971e-39\\ \hline 40 & 70 & -4 & 2 & 21.3238 & 10.4744 & 46.52 & 2.23268e-60 & 2.04208e-60\\ \hline 40 & 70 & -2 & 4 & 19.4641 & 9.73745 & 28070.4 & 8.35698e-40 & 5.04061e-40\\ \hline 40 & 70 & -1 & 4 & 20.4786 & 14.8176 & 3.65e+06 & 4.83707e-30 & 2.49511e-30 \\ \hline \end{tabular} \end{center} \label{tab1} \end{table} In Table \ref{tab1} the columns correspond to the following parameters: $n$ is the order of the matrix $A$, $D$ is the number of digits used in the computations, $a$ and $b$ are the end points of the interval where the random entries are generated using the uniform distribution, $\| A \| $ is the infinity norm of $A$, $\rho$ is the spectral radius of $A$, $\| \exp(A) \| $ is the infinity norm of the computed $\exp(A)$, $\delta$ and $\mu$ are the relative errors defined by \eqref{e3.3} and \eqref{e3.4}. In these computations we computed the matrix function at $t=1$. The matrix $A$ is the generated random matrix multiplied by a scaling factor of $0.25$. This was done in order to deal with matrices whose norms have a reasonable size. Note that the less accurate results are obtained when the norm of $\exp(A)$ is large. Note also that as $n$ increases it is necessary to increase the number of digits used in the computations in order to obtain acceptable error sizes. The entries of the computed matrix are expressed as explicit linear combinations of (complex) exponentials, for example, an entry obtained in a case with $n=20$ looks like \begin{align*} & 0.2745 \exp(1.6103 t) \cos(0.8201 t) -0.0166 \exp(1.6103 t) \sin(0.8201 t) \\ &-0.2018 \exp(0.5083 t) \cos(0.4520 t) -0.1724 \exp(0.5083 t) \sin(0.4520 t) \\ &+0.0063 \exp(0.8337 t) \cos( 1.1256 t) +0.1363 \exp(0.8337 t) \sin(1.1256 t) \\ &-0.9704 \exp(-2.1568 t) \cos(0.4983 t) +0.5197 \exp(-2.1568 t) \sin(0.4983 t) \\ &+0.8632 \exp(-1.8023 t) \cos(0.2841 t) -1.4200 \exp(- 1.8023 t) \sin(0.2841 t) \\ &-0.0190 \exp(0.6122 t) \cos(2.4512 t) -0.0273 \exp(0.6122 t) \sin(2.4512 t) \\ &-0.1654 \exp(-1.4406 t) \sin(1.4041 t) -0.0217 \exp(-1.4406 t) \cos(1.4041 t) \\ &+0.0345 \exp(-0.4696 t) \cos(2.0036 t) +0.0760 \exp(-0.4696 t) \sin(2.0036 t) \\ &-0.0075 \exp(-0.2615 t) +0.0426 \exp(0.9229 t) -0.0193 \exp(1.6650 t) \\ &+0.0186 \exp(-0.7649 t), \end{align*} where the number of digits was truncated to abbreviate the expression. Therefore, using this method for matrices of large order $n$ is not very convenient. In addition, in such cases computing the eigenvalues with the required accuracy is not easy. In \cite{JRMVS} we use another algorithm, where the functions $g_k$ are expressed as truncated Taylor series and the eigenvalues of $A$ are not used. Since the computation of $\exp(tA)$ is essentially a matrix valued Hermite interpolation at the multiset of eigenvalues, the accuracy of the computed results depends on the distribution of the eigenvalues on the complex plane. This is why the error estimate $\delta$ is very useful. The repeated differentiation to compute the functions $g_k(t)$ and the computation of the matrices $w_k(A)$ are steps that may reduce the accuracy of the computation if there are either eigenvalues or entries of $A$ with large absolute values. In addition, if there are sets of eigenvalues very close to each other then the repeated application of formula \eqref{e2.11} is another source of errors. Relatively large errors are obtained when an eigenvalue of multiplicity greater than one is not properly detected and is treated as several distinct eigenvalues that are very close to each other. In that case some of the denominators in \eqref{e2.11} become very small. Therefore the multiple eigenvalues and their multiplicities should be properly identified, for example, using some root refining algorithm such as Newton's method. \subsection*{Acknowledgments} This research was partially supported by COFAA and EDD grants from IPN, M\'exico. \begin{thebibliography}{0} \bibitem{Hi} N. J. Higham; \emph{Functions of Matrices, Theory and Computation}, SIAM Publications, Philadelphia, PA, 2008. \bibitem{LR} U. Luther, K. Rost; \emph{Matrix exponentials and inversion of confluent Vandermonde matrices}, Electronic Transactions on Numerical Analysis, 18 (2004) 91--100. \bibitem{JRMVS} J. R. Mandujano, L. Verde-Star; \emph{Computation of the matrix exponential using the dynamic solution}, submitted. \bibitem{moler} C. B. Moler, C. F. Van Loan; \emph{Nineteen dubious ways to compute the exponential of a matrix}, twenty-five years later, SIAM Rev. 45 (2003) 3--49. \bibitem{ruiz2} J. C. Ruiz Claeyssen, T. Tsukazan; \emph{Dynamic solutions of linear matrix differential equations}, Quart. Appl. Math., 48 (1990) 169--179. \bibitem{DDCI} L. Verde-Star; \emph{Divided differences and combinatorial identities}, Stud. Appl. Math., 85 (1991) 215--242. \bibitem{FofM} L. Verde-Star; \emph{Functions of matrices}, Linear Algebra Appl., 406 (2005) 285--300. \bibitem{aactm} L. Verde-Star; \emph{An algebraic approach to convolutions and transform methods}, Advances in Appl. Math., 19 (1997) 117--143. \bibitem{Wu} B. Wu; \emph{Explicit formulas for the exponentials of some special matrices}, Appl. Math. Letters, 24 (2011) 642--647. \end{thebibliography} \end{document}