Dashboards and Apps: Building the Tool, Not Just the Charts

schedule15 min readfitness_center3 exercises

At 7 a.m. A production foreman opens the field dashboard and decides where to send a truck. Sixty wells, one screen, fifteen minutes. He is not looking for pretty gauges; he is looking for the three wells that changed overnight: the one whose feed went quiet, the one whose rate fell off a cliff, the one that has been cycling on and off all week. Get those three right and the workover crew fixes the real problem today. Get them wrong (a stale reading mistaken for a healthy well, a sensor glitch mistaken for a decline) and the truck drives to the wrong location while the actual problem keeps losing barrels.

That is the uncomfortable truth about dashboards: the part everyone photographs, the charts, the layout, the company colours, is the easy part. The engineering is the data contract underneath: the KPIs computed correctly, the bad data caught before it is plotted, the millions of raw points reduced to a chart that is both fast and honest. This chapter builds that layer, and every line of it runs. Then we wrap it in the thinnest possible Streamlit app, shown, not executed, because a live server cannot live inside a book, but every number that app would display came from the engine we built and verified here.

infoWhat You'll Learn

Compute the surveillance KPIs a monitoring screen actually exists to show, and why each one is a decision, not a decoration
Reduce a million data points to a few hundred without averaging away the outage that matters
Wrap a verified engine in a Streamlit app, and see how little UI code a good data layer needs
Turn a passive dashboard into an alarm: a ranked triage list, not a wall of gauges

lightbulbDataset Used in This Chapter

A synthetic-but-realistic field surveillance feed: six wells, two years of daily oil rates with sensor noise, real outages, and three planted problems (a stale feed, a steep decline, and a chronically intermittent well). Generated in the first cell, so every cell runs offline.

The KPIs Are the Product

A dashboard shows numbers. The numbers have to be right, and they have to be the numbers a decision hangs on. For production surveillance, four do most of the work: the latest producing rate in barrels of oil per day (bopd: is the well alive?), the 30-day decline (is it falling faster than expected?), the uptime (is it actually on?), and the data age (do I even believe today's reading?). Compute those for every well and you have replaced "stare at sixty line charts" with "read one scorecard." The feed itself is built from the decline-curve parameters of the earlier chapters: qi the initial rate, Di the annual decline, b the hyperbolic exponent, so it falls and stutters like a real well, not like random noise.

main.py

import numpy as np
import pandas as pd

# Per-well profiles: (id, qi, annual Di, b, problem). The field is mostly healthy;
# three wells carry the problems a morning surveillance check exists to catch.
WELLS = [
    ("OD-001", 1500, 0.22, 0.6, "stale"),      # feed stopped reporting 3 days ago
    ("OD-002", 2100, 0.30, 0.7, None),
    ("OD-003", 2600, 0.28, 0.8, "decline"),    # a pressure hit -> steep 30-day decline
    ("OD-004", 1800, 0.20, 0.5, None),         # the steady earner
    ("OD-005", 1200, 0.35, 0.7, "downtime"),   # cycles on and off -> low uptime
    ("OD-006", 2000, 0.26, 0.6, None),
]
DAYS = 730


def make_field(seed=11):
    """A field's DAILY surveillance feed (long format): hyperbolic decline with
    sensor noise, real outages, and three planted problems -- the raw stream a
    monitoring dashboard sits on top of."""
    rng = np.random.default_rng(seed)
    rows = []
    for wid, qi, Di_yr, b, problem in WELLS:
        Di = Di_yr / 365.0
        t = np.arange(DAYS)
        q = qi / np.power(1 + b * Di * t, 1.0 / b) * rng.normal(1.0, 0.03, DAYS)
        if problem == "decline":                       # a pressure / liquid-loading hit
            q[-40:] *= np.linspace(1.0, 0.55, 40)      # -45% over the last 40 days
        if problem == "downtime":                      # a chronically intermittent producer
            q[rng.random(DAYS) < 0.20] = 0.0           # ~20% of days down -> ~80% uptime
        else:
            for _ in range(rng.integers(2, 5)):        # occasional multi-day outage
                s = rng.integers(0, DAYS - 6)
                q[s:s + rng.integers(1, 6)] = 0.0
        last = DAYS - (3 if problem == "stale" else rng.integers(0, 2))   # some feeds lag
        q = np.maximum(q[:last], 0.0)
        rows.append(pd.DataFrame({"well": wid, "day": np.arange(len(q)), "oil_bopd": q}))
    return pd.concat(rows, ignore_index=True)


def well_kpis(field):
    """The surveillance scorecard: one row per well, the four numbers a foreman reads."""
    asof = field.day.max()                             # the field's most recent reporting day
    out = []
    for w, g in field.groupby("well"):
        g = g.sort_values("day")
        rate = g.oil_bopd.values
        prod = rate > 0                                # producing days (zeros are downtime)
        last_rate = float(rate[prod][-1]) if prod.any() else 0.0   # rate[prod] keeps producing days; [-1] = latest
        recent, prior = rate[-30:], rate[-60:-30]      # this month vs last, on producing days
        a_recent = recent[recent > 0].mean() if (recent > 0).any() else 0.0
        a_prior = prior[prior > 0].mean() if (prior > 0).any() else np.nan
        decl = (a_prior - a_recent) / a_prior * 100 if a_prior and not np.isnan(a_prior) else 0.0
        out.append(dict(well=w, last_rate=round(last_rate, 1), decline_30d_pct=round(float(decl), 1),
                        uptime_pct=round(float(prod.mean() * 100), 1),
                        cum_mbbl=round(rate.sum() / 1000, 1), days_stale=int(asof - g.day.max())))
    return pd.DataFrame(out)


field = make_field()
kpis = well_kpis(field)
print(kpis.to_string(index=False))

Read it as the foreman does. OD-004 is the steady earner: shallow decline, high uptime, fresh data; leave it alone. Three wells are not fine, each for a different reason: OD-001 has not reported in three days (is it dead, or is the telemetry down?), OD-003 has shed 27% in a month (a pressure or liquid-loading problem, liquid choking the flow, not normal decline), and OD-005 is producing only 84% of the time (it is cycling on and off, a candidate for artificial lift, a pump to keep it flowing). The cumulative column (cum_mbbl, thousands of barrels) tells you which wells are worth the attention in the first place. None of that is visible in a wall of line charts; all of it is obvious in one table. The scorecard is the dashboard; the rest is presentation.

A Million Points, a Thousand Pixels

Two years of daily data for sixty wells is not large by database standards, but a browser asked to draw 40,000 points per chart will crawl, and, worse, it will lie. The usual fix, averaging the data into buckets, is exactly the wrong one for surveillance: average a one-day outage in with its neighbours and the zero disappears, so the chart shows a well that never stopped. The honest reduction keeps the extremes of each bucket, the lowest and the highest sample, so a spike or an outage always survives.

main.py

import matplotlib.pyplot as plt


def downsample(day, rate, n_buckets=120):
    """min/max decimation: per bucket keep BOTH the lowest and highest sample, so a
    spike or a zero (outage) is never averaged away. Returns reduced (day, rate)."""
    if len(day) <= 2 * n_buckets:
        return day, rate
    keep = []
    for chunk in np.array_split(np.arange(len(day)), n_buckets):
        r = rate[chunk]
        keep.append(chunk[r.argmin()])                 # the bucket's lowest (an outage shows here)
        keep.append(chunk[r.argmax()])                 # and its highest (a spike shows here)
    keep = np.unique(keep)
    return day[keep], rate[keep]


g = field[field.well == "OD-004"].sort_values("day")
day, rate = g.day.values, g.oil_bopd.values
buckets = np.array_split(np.arange(len(day)), 120)
mean_day = np.array([day[b].mean() for b in buckets])
mean_rate = np.array([rate[b].mean() for b in buckets])     # the lossy, lying reduction
dd, dr = downsample(day, rate, 120)                          # the honest one

print(f"raw points {len(day)} -> downsampled {len(dd)}; outage days in raw: {(rate == 0).sum()}")
print(f"lowest value shown -- mean-resample: {mean_rate.min():.0f} bopd   min/max: {dr.min():.0f} bopd")

fig, ax = plt.subplots(figsize=(9, 4))
ax.plot(day, rate, color="0.8", lw=0.6, label="raw daily (730 pts)")
ax.plot(mean_day, mean_rate, color="#E8743B", lw=1.4, label="mean-resample (hides outages)")
ax.plot(dd, dr, color="#1f77b4", lw=0.9, label="min/max decimation (keeps outages)")
ax.set_xlabel("day"); ax.set_ylabel("oil rate (bopd)"); ax.set_title("OD-004 - honest vs lossy chart reduction")
ax.legend(fontsize=8); fig.tight_layout(); plt.show()

The two reductions draw almost the same smooth decline, and then disagree completely in the one place that matters: the mean curve never touches zero, while the min/max curve drops to the floor a dozen times. A foreman scanning the orange chart would never send anyone to check why OD-004 keeps stopping. Reducing data for speed is unavoidable; reducing it in a way that deletes the events you are watching for is malpractice. The rule is simple: decimate, don't average, anything you are surveilling for.

The App Is a Thin Shell

Here is the part the data layer earns: once the engine exists, the application is almost nothing. A complete Streamlit surveillance app is a sidebar to pick a well, a scorecard, and a chart, every value of which is produced by the functions above. The block below is the entire app. It is marked not to run (a web server cannot start inside a rendered book), but read it and notice how little of it is new: it is well_kpis, downsample, and a plot, with three lines of layout around them.

main.py

# app.py  --  run locally with:  streamlit run app.py
import streamlit as st

st.title("Field Production Surveillance")

@st.cache_data                       # recompute ONLY when the data changes, not on every click
def load():
    field = make_field()             # in production: read the historian / SQL feed instead
    return field, well_kpis(field)

field, kpis = load()

# The triage table up top -- the screen a foreman reads first.
st.subheader("Scorecard")
st.dataframe(kpis.style.apply(lambda r: ["background:#ffd0d0" if r.days_stale > 2
             or r.decline_30d_pct > 10 or r.uptime_pct < 85 else "" for _ in r], axis=1))

# One well's detail, downsampled so the chart stays responsive and honest.
well = st.sidebar.selectbox("Well", kpis.well)
g = field[field.well == well].sort_values("day")
dd, dr = downsample(g.day.values, g.oil_bopd.values, 200)
st.line_chart(pd.DataFrame({"day": dd, "oil_bopd": dr}).set_index("day"))

The one line doing real work that is new here is @st.cache_data. Without it, Streamlit re-runs the whole script, reloading the feed and recomputing every KPI, on every click, every slider drag, every page refresh. At sixty wells and two years of daily data that is the difference between an app that responds instantly and one a foreman gives up on by 7:05. Caching is not an optimisation you add later; on a real feed it is the line between a tool and a toy. The lesson generalises past Streamlit: Dash, Panel, and Voilà all wrap the same engine in a different syntax, so pick whichever your team will maintain.

From Dashboard to Alarm

A dashboard nobody acts on is wallpaper. The last step turns the scorecard into an instruction: which wells need a human today, and why. The thresholds encode the operation's tolerance (a feed silent for more than two days, a 30-day decline past 10%, uptime under 85%) and the output is a ranked triage list, not sixty gauges to interpret.

main.py

def alerts(kpis):
    """Turn the scorecard into a triage list: which wells need attention today, and why."""
    flags = {}
    for _, r in kpis.iterrows():
        reasons = []
        if r.days_stale > 2:
            reasons.append(f"STALE ({r.days_stale}d)")          # telemetry or well may be down
        if r.decline_30d_pct > 10:
            reasons.append(f"STEEP_DECLINE ({r.decline_30d_pct:.0f}%/mo)")
        if r.uptime_pct < 85:
            reasons.append(f"LOW_UPTIME ({r.uptime_pct:.0f}%)")
        if reasons:
            flags[r.well] = reasons
    return flags


triage = alerts(kpis)
print(f"{len(triage)} of {len(kpis)} wells need attention this morning:\n")
for well, reasons in triage.items():
    print(f"  {well}: {', '.join(reasons)}")

Three wells, three different problems, one line each: that is the deliverable, and it is what separates a surveillance tool from a screensaver. The dashboard's job was never to show all the data; it was to find the three wells that changed and say so plainly. Everything upstream (the KPIs, the honest downsampling, the cache that keeps it fast) exists to make this list trustworthy. Build that, and the choice of Streamlit versus Dash versus a notebook is a detail. Skip it, and the prettiest dashboard in the company is still just decoration.

Exercises

These work on the engine behind the dashboard: the part that has to be right.

fitness_center

Exercise 20.1Practice

: A KPI That Changes the Call

Add a fifth KPI to well_kpis: days of inventory, at the current producing rate and a given economic limit, how many days until the well drops below it...

arrow_forward

codePythonSolve Nowarrow_forward

fitness_center

Exercise 20.2Practice

: Prove the Chart Is Honest

Take a well, inject a single one-day outage (set one day's rate to zero), and reduce the series both ways, mean-resampling and min/max decimation, to ...

arrow_forward

codePythonSolve Nowarrow_forward

fitness_center

Exercise 20.3Practice

: Tune the Alarm

The alert thresholds (2 days stale, 10%/mo decline, 85% uptime) decide how many trucks roll. Sweep the decline threshold from 5% to 25% and report how...

arrow_forward

codePythonSolve Nowarrow_forward

Summary

The dashboard is the thin part; the data contract is the engineering. Charts and layout are quick once the KPIs, reductions, and alerts beneath them are correct, and worthless if they are not.
KPIs are decisions, not decorations. Latest rate, 30-day decline, uptime, and data age each answer a question a foreman acts on; computing them once replaces reading sixty charts.
Decimate, don't average, anything you surveil for. Mean-resampling deletes the one-day outage you are watching for; min/max decimation keeps every spike and every zero while still thinning the data for speed.
Cache the engine, not the screenshot. On a real feed, @st.cache_data is the line between an app that responds and one nobody opens twice.
A dashboard's product is the triage list. The point is to find the few wells that changed and say so plainly; the framework you wrap that in is a detail.

arrow_backPreviousCh 19: Real-World Integration Projects NextCh 21: Cloud Deployment & Automationarrow_forward