{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HowTo: Pumping test analysis with `timflow`\n",
    " \n",
    "This HowTo describes how to use `timflow` for pumping test analysis (also called\n",
    "aquifer test analysis). A simple pumping test is analyzed in this Notebook. A whole\n",
    "series of notebooks with examples of pumping test analyses are listed under the\n",
    "'pumping test benchmark' section of the ReadTheDocs.\n",
    "\n",
    "We start by importing the required packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "import timflow.transient as tft\n",
    "\n",
    "#\n",
    "plt.rcParams[\"figure.figsize\"] = (4, 3)  # set default figure size\n",
    "plt.rcParams[\"font.size\"] = 8  # set default font size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We analyze pumping test data from a pumping test carried out near the *Oude Korendijk* in The Netherlands, as described in the famous book on pumping tests by Kruseman and de Ridder (downloadable [here](https://gw-project.org/download/analysis-and-evaluation-of-pumping-test-data/)). The pumping test was carried out in a 7 m thick, confined aquifer with the top at $z_t=-18$ m and the bottom at $z_b=-25$ m. The objective of the pumping test was to determine the hydraulic conductivity and specific storage coefficient of the aquifer.  The discharge of the well during the pumping test was 788 m$^3$/day. The observed drawdown (in meters) and the observation time (in minutes) after pumping started were measured in two observations wells, one located 30 m from the pumping well and the other located 90 m from the pumping well. Kruseman and de Ridder (1990) also report drawdown in an additional well, but that one is not considered here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The time (in minutes) and drawdown (in meters) of the two wells are stored in the files `data/oudekorendijk_h30.dat` and `data/oudekorendijk_h90.dat`, respectively. The time and head are read from the data files. The time is converted to days and the drawdown is converted to meters, after which the data is plotted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.loadtxt(\"data/oudekorendijk_h30.dat\")\n",
    "to1 = data[:, 0] / 60 / 24\n",
    "ho1 = -data[:, 1]\n",
    "ro1 = 30\n",
    "print(\n",
    "    f\"first measurement time: {np.min(to1):.2e}, \"\n",
    "    f\"last measurement time: {np.max(to1):.2e} d\"\n",
    ")\n",
    "\n",
    "drawdown = np.loadtxt(\"data/oudekorendijk_h90.dat\")\n",
    "to2 = drawdown[:, 0] / 60 / 24\n",
    "ho2 = -drawdown[:, 1]\n",
    "ro2 = 90\n",
    "print(\n",
    "    f\"first measurement time: {np.min(to2):.2e}, \"\n",
    "    f\"last measurement time: {np.max(to2):.2e} d\"\n",
    ")\n",
    "\n",
    "plt.plot(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.plot(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A pumping test analysis with `timflow` consists of the following steps:\n",
    "1. Create a transient `timflow` model using your best guess of the parameters.\n",
    "2. Create a `Calibrate object`\n",
    "3. Specify which parameters you want to estimate.\n",
    "4. Specify which time serie(s) you want to use to estimate the parameters.\n",
    "5. Estimate the parameters\n",
    "\n",
    "The parameters are fitted using the least squares method. An iterative approach is used to find the optimal parameters. The iterative approach is pretty fancy, but no iterative method is foolproof. The better your guess of the parameters, the higher the likelihood that the iterative method will find the optimal parameters (no pun intended). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We creat a model for a single confined aquifer. The initial guess is that the hydraulic conductivity is $k=20$ m/d and the specific storage coefficient is $S_s=10^{-4} \\,\\text{m}^{-1}$. `tmin` and `tmax` are set to cover all observation times. One well is added with a discharge of 788 m$^3$/d. No recovery was measured, so the well does not have to be shut off. Wellbore storage is not simulated. The model is solved. As a check, the head is plotted vs. the observations. This confirms that the model runs. As can be seen, the initial guess is not bad, but obviously not optimal. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ml = tft.ModelMaq(kaq=60, z=(-18, -25), Saq=1e-4, tmin=1e-5, tmax=1)\n",
    "w = tft.Well(model=ml, xw=0, yw=0, rw=0.1, tsandQ=[(0, 788)], layers=0)\n",
    "ml.solve()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hm1 = ml.head(ro1, 0, to1)\n",
    "hm2 = ml.head(ro2, 0, to2)\n",
    "plt.plot(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.plot(to1, hm1[0], \"C0\")\n",
    "plt.plot(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.plot(to2, hm2[0], \"C1\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Steps 2-5 are conducted in the next code block. At first, only the observations in the first well (at 30 m from the pumping well) are used for calibration.\n",
    "\n",
    "2. The `Calibration` object is created for the model `ml` (which was created above). The `Calibration` object is stored in the variable `cal`. \n",
    "3. It is specified that the `Calibration` object must fit for the hydraulic conductivity of layer 0 by using the `set_parameter` function. The name of the parameters is `kaq` (the same name as used to create the `timflow` model) and the `0` is added to indicate that it is the hydraulic conductivity of layer 0. Furthermore, the initial value is specified. Again, use your best guess. Next, it is specified to fit for the specific discharge of layer 0 in the same fashio. The name of the parameter is `Saq` (again, the same name as used to create the `timflow` model) and the `0` is added to indicate layer 0. \n",
    "4. It is specified to use the times and heads of the observation well at $r=30$ m using the `series` function. The `series` function takes a name (use whatever name you like to identify this observation well), the `x` and `y` location of the well, the layer number, an array of the observation times and an array of the observed heads.\n",
    "5. Find the optimal parameters using the `fit` function. A dot is printed to the screen every time the model is solved with a new set of parameters during the iterative optimization approach. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cal = tft.Calibrate(model=ml)  # create Calibration object\n",
    "cal.set_parameter(name=\"kaq\", initial=20, layers=0)  #\n",
    "cal.set_parameter(name=\"Saq\", initial=1e-4, layers=0)\n",
    "cal.series(\n",
    "    name=\"obs1\",  # user specified name\n",
    "    x=ro1,  # x=location of observation well\n",
    "    y=0,  # y-location of obsevation well\n",
    "    layer=0,  # layer number\n",
    "    t=to1,  # observation times\n",
    "    h=ho1,  # observed heads\n",
    ")\n",
    "cal.fit(report=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A good approach to print the fitted parameters to the screen is to display the `cal.parameters` `DataFrame`. Note that the estimated standard error (the column `std`) is **not** a very good estimate of the standard error, as the remaining errors rarely statisfy the required statistical conditions for such an estimate of the standard error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(cal.parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The optimal parameters are also automatically stored in the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"k of model: \", ml.aq.kaq)\n",
    "print(\"Ss of model: \", ml.aq.Saq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hence, a graph of the model fit can be created in the same fashion as before. Note that the fit for the observations at $r=30$ m is indeed a lot better than with our initial guess."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hm1 = ml.head(ro1, 0, to1)\n",
    "plt.plot(to1, ho1, \"k.\", label=\"r=30 m\")\n",
    "plt.plot(to1, hm1[0], \"C0\", label=\"fit\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model with the optimal parameters deviates significantly from the obervations at larger times. Maybe the confining top layer can not be simulated as impereable. Let's modify the model and replace the confined aquifer by a semi-confined aquifer. The new model is called `ml2`. The thickness of the semi-confining layer is set to 1 (i.e., `z` goes from -17 to -18) and the initial guess for the resistance of the semi-confining layer is 1000 d. Storage in the semi-confining layer is negelected. A new calibration object is created, which is used now to fit three parameters. The additional parameter is called `'c0'`, which stands for the resistance of leaky layer 0, which is on top of aquifer layer 0. Note that the optimal hydraulic conductivity is now more than 10% lower than for the confined aquifer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ml2 = tft.ModelMaq(\n",
    "    kaq=60,\n",
    "    z=(-17, -18, -25),\n",
    "    Saq=1e-4,\n",
    "    c=[1000],\n",
    "    Sll=0,\n",
    "    topboundary=\"semi\",\n",
    "    tmin=1e-5,\n",
    "    tmax=1,\n",
    ")\n",
    "w = tft.Well(model=ml2, xw=0, yw=0, rw=0.1, tsandQ=[(0, 788)], layers=0)\n",
    "ml2.solve()\n",
    "#\n",
    "cal2 = tft.Calibrate(model=ml2)  # create Calibration object\n",
    "cal2.set_parameter(name=\"kaq\", initial=20, layers=0)  #\n",
    "cal2.set_parameter(name=\"Saq\", initial=1e-4, layers=0)\n",
    "cal2.set_parameter(name=\"c\", initial=1000, layers=0)\n",
    "cal2.series(\n",
    "    name=\"obs1\",  # user specified name\n",
    "    x=ro1,  # x=location of observation well\n",
    "    y=0,  # y-location of obsevation well\n",
    "    layer=0,  # layer number\n",
    "    t=to1,  # observation times\n",
    "    h=ho1,  # observed heads\n",
    ")\n",
    "cal2.fit(report=False)\n",
    "display(cal2.parameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fit of the two models can be compared visually:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hm1 = ml.head(ro1, 0, to1)\n",
    "hm2 = ml2.head(ro1, 0, to1)\n",
    "plt.plot(to1, ho1, \"k.\", label=\"r=30 m\")\n",
    "plt.plot(to1, hm1[0], label=\"fit 2 params\")\n",
    "plt.plot(to1, hm2[0], label=\"fit 3 params\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or the fit of the two models can be compared by computing the root mean squared error for the two models. Note that addition of the semi-confining layer almost halved the root mean squared error. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"rmse 2 params: {cal.rmse():.3f} m\")\n",
    "print(f\"rmse 3 params: {cal2.rmse():.3f} m\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The root mean squared error can be computed with the `rmse` function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, the two models are fitted to the observations in both observation wells. This is done by adding a second series to both `cal` and `cal2` and re-fitting the model. The root mean squared error are printed to the screen and graphs of the model fit are produced."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cal.series(\n",
    "    name=\"obs2\",  # user specified name\n",
    "    x=ro2,  # x=location of observation well\n",
    "    y=0,  # y-location of obsevation well\n",
    "    layer=0,  # layer number\n",
    "    t=to2,  # observation times\n",
    "    h=ho2,  # observed heads\n",
    ")\n",
    "cal.fit(report=False)\n",
    "display(cal.parameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cal2.series(\n",
    "    name=\"obs2\",  # user specified name\n",
    "    x=ro2,  # x=location of observation well\n",
    "    y=0,  # y-location of obsevation well\n",
    "    layer=0,  # layer number\n",
    "    t=to2,  # observation times\n",
    "    h=ho2,  # observed heads\n",
    ")\n",
    "cal2.fit(report=False)\n",
    "display(cal2.parameters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"rmse 2 params: {cal.rmse():.3f} m\")\n",
    "print(f\"rmse 3 params: {cal2.rmse():.3f} m\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8, 3))\n",
    "plt.subplot(121)\n",
    "hm1 = ml.head(ro1, 0, to1)\n",
    "hm2 = ml.head(ro2, 0, to2)\n",
    "plt.plot(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.plot(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.plot(to1, hm1[0], \"C0\", label=\"fit 2 params\")\n",
    "plt.plot(to2, hm2[0], \"C1\", label=\"fit 2 params\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()\n",
    "#\n",
    "plt.subplot(122)\n",
    "hm1 = ml2.head(ro1, 0, to1)\n",
    "hm2 = ml2.head(ro2, 0, to2)\n",
    "plt.plot(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.plot(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.plot(to1, hm1[0], \"C0\", label=\"fit 3 params\")\n",
    "plt.plot(to2, hm2[0], \"C1\", label=\"fit 3 params\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Insight of the model fit can also be gained by plotting the results on semi-log graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8, 3))\n",
    "plt.subplot(121)\n",
    "hm1 = ml.head(ro1, 0, to1)\n",
    "hm2 = ml.head(ro2, 0, to2)\n",
    "plt.semilogx(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.semilogx(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.semilogx(to1, hm1[0], \"C0\", label=\"fit 2 params\")\n",
    "plt.semilogx(to2, hm2[0], \"C1\", label=\"fit 2 params\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()\n",
    "#\n",
    "plt.subplot(122)\n",
    "hm1 = ml2.head(ro1, 0, to1)\n",
    "hm2 = ml2.head(ro2, 0, to2)\n",
    "plt.semilogx(to1, ho1, \"C0.\", label=\"r=30 m\")\n",
    "plt.semilogx(to2, ho2, \"C1.\", label=\"r=90 m\")\n",
    "plt.semilogx(to1, hm1[0], \"C0\", label=\"fit 3 params\")\n",
    "plt.semilogx(to2, hm2[0], \"C1\", label=\"fit 3 params\")\n",
    "plt.xlabel(\"time (d)\")\n",
    "plt.ylabel(\"head (m)\")\n",
    "plt.legend()\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is now up to the modeler to decide which model is better and whether a semi-confined aquifer is more realistic than a confined aquifer. The model with 3 parameters gives a better fit in terms of the root mean squared error, but the behavior for large time deviates from the observations: the observed heads seem to go down with time in the semi-log graphs, but the modeled heads seem to level off for large time."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}