Working on the application of nonlinear mixed-effects models to chemical degradation data, it became desirable to further speed up the solution of at least the most common degradation models used in the mkin package, simple first-order (SFO) decline of the parent compound coupled to SFO decline of a metabolite, or biphasic (Dual First-Order in Parallel, DFOP) decline of the parent coupled to SFO decline of a metabolite (DFOP-SFO).
Looking back into some highly appreciated (Schwarzenbach et al., 2003) and useful (Schnoor, 1996) textbooks, I was confirmed that it is possible to obtain symbolic solutions for many of the kinetic models used in mkin, which should outperform the currently employed solution methods, which are based on numerical eigenvalue computations or iterative ode solvers, optionally using compiled C code for the differential equations.
After some attempts with the Computer Algebra Systems yacas and SymPy, I settled for Maxima for doing the symbolic computations. It had the most reliable ode solver (desolve) and supports symbolic computation of Eigenvectors and Eigenvalues. The solutions were prepared as a Jupyter notebook, which gained support for Maxima kernels by the excellent project Maxima-Jupyter. The notebook that was prepared for this purpose can be viewed here.
While refactoring the mkinfit function in order to allow the use of analytical solutions for coupled systems, I ventured into profiling R code. With the help of several internet sources, most notably the chapter on profiling in Hadley Wickhams Advanced R book, I found that the I had some rather wasteful code in my cost function which is being executed in every iteration.
The most important speed gain was to avoid the use of the merge function for dataframes. So now mkinpredict returns a matrix object, from which the relevant entries can be extracted by matrix subsetting, i.e. using an index in the form of a two-column matrix.
With the performance improvements, executing the test suite using the freshly released mkin version 0.9.50.1 now only takes about 40 seconds as opposed to 130 seconds that it took at the time of the last release!
Some slightly more detailed benchmarks can be found in the benchmark vignette. Enjoy!
Bibliography
Jerald L. Schnoor. Environmental Modelling - Fate and transport of pollutants in water, air and soil. Wiley-Interscience, 1996. ↩
R. Schwarzenbach, P. Gschwend, and D. Imboden. Environmental Organic Chemistry. Wiley, Hoboken, 2nd edition, 2003. ↩