Part I: Python Fundamentals

Chapter 3

Data Structures for Petroleum Data

schedule15 min readfitness_center10 exercises

Petroleum engineering generates enormous volumes of data across every phase of a well's life. Drilling reports, well logs, production records, fluid analyses, pressure tests, seismic surveys — each arrives in a different format, at a different frequency, with a different structure. An engineer who cannot organize, access, and manipulate this data efficiently is an engineer who spends most of their time fighting spreadsheets instead of solving problems.

Python provides four core data structures — lists, tuples, dictionaries, and sets — each suited to a different kind of petroleum data. This chapter teaches all four through real engineering use cases, then moves to the file formats you will encounter in every petroleum engineering workflow: CSV, JSON, and LAS.

infoWhat You Will Learn

Lists — ordered, changeable collections for production histories, depth arrays, and time series
Tuples — fixed collections for coordinates, well locations, and parameter sets that must not change
Dictionaries — key-value lookups for well headers, fluid properties, and configuration data
Sets — unique collections for well inventories, field comparisons, and deduplication
File I/O — reading and writing CSV, JSON, and LAS files, the three formats that dominate petroleum data exchange
List and dictionary comprehensions — concise, readable data transformations

Lists — Production Histories and Depth Arrays

A list is an ordered, changeable sequence of values. In petroleum engineering, lists naturally represent anything that comes in a sequence: monthly production rates, depth intervals, casing diameters, or a series of pressure readings.

main.py

# Monthly oil production rates (bopd) for Well OD-003 — first 12 months
oil_rates = [3150, 2780, 2460, 2190, 1965, 1780,
             1625, 1500, 1395, 1305, 1225, 1155]

# Monthly water production rates (bwpd)
water_rates = [420, 480, 560, 650, 740, 830,
               920, 1010, 1095, 1175, 1250, 1320]

# Basic list operations that you will use constantly
print(f"Months of data:       {len(oil_rates)}")
print(f"Initial oil rate:     {oil_rates[0]:,} bopd")
print(f"Final oil rate:       {oil_rates[-1]:,} bopd")
print(f"Peak oil rate:        {max(oil_rates):,} bopd")
print(f"Average oil rate:     {sum(oil_rates) / len(oil_rates):,.0f} bopd")
print(f"Total oil (approx):   {sum(r * 30 for r in oil_rates):,} bbl")

Indexing and Slicing

Lists are zero-indexed — the first element is at position 0. This is a universal convention in programming, and while it takes some adjustment if you are coming from Excel (where row 1 is the first row), it becomes natural quickly.

main.py

# Casing program — diameters from surface to TD
casing_diameters_in = [36, 20, 13.375, 9.625, 7]
casing_names = ["Conductor", "Surface", "Intermediate", "Production", "Liner"]

# Access specific elements
print(f"Surface casing:     {casing_diameters_in[1]}\" ({casing_names[1]})")
print(f"Deepest casing:     {casing_diameters_in[-1]}\" ({casing_names[-1]})")

# Slicing — get a range of elements
# First three casings (indices 0, 1, 2)
upper_casings = casing_diameters_in[:3]
print(f"Upper casings:      {upper_casings}")

# Last two casings
lower_casings = casing_diameters_in[-2:]
print(f"Lower casings:      {lower_casings}")

# Quarter-by-quarter production (3 months per quarter)
q1_rates = oil_rates[0:3]
q2_rates = oil_rates[3:6]
q3_rates = oil_rates[6:9]
q4_rates = oil_rates[9:12]

print(f"\nQuarterly average oil rates:")
for q_name, q_data in [("Q1", q1_rates), ("Q2", q2_rates), ("Q3", q3_rates), ("Q4", q4_rates)]:
    avg = sum(q_data) / len(q_data)
    print(f"  {q_name}: {avg:,.0f} bopd")

Modifying Lists — Adding and Updating Data

Production data grows over time. New months arrive, wells come online, records get corrected. Lists are mutable — you can add, remove, and change elements.

main.py

# A field's active well list — this changes as wells come online or shut in
active_wells = ["OD-001", "OD-002", "OD-003", "OD-005"]

# A new well comes online
active_wells.append("OD-007")
print(f"After OD-007 online:   {active_wells}")

# OD-002 shuts in for workover
active_wells.remove("OD-002")
print(f"After OD-002 shut-in:  {active_wells}")

# Insert a well at a specific position (alphabetical order matters for reports)
active_wells.insert(1, "OD-002A")  # Sidetrack of OD-002
print(f"After sidetrack added: {active_wells}")

# Sort alphabetically
active_wells.sort()
print(f"Sorted:                {active_wells}")

# Production data — append new month
oil_rates.append(1095)  # Month 13
water_rates.append(1385)
print(f"\nMonths of data now: {len(oil_rates)} (added month 13: {oil_rates[-1]} bopd)")

Water Cut Trend — Lists + Visualization

Water cut — the fraction of produced liquid that is water — is one of the most important surveillance metrics in production engineering. A rising water cut means the reservoir's water is encroaching on the oil zone. Here is how to compute and visualize it from two lists.

main.py

import matplotlib.pyplot as plt

months = list(range(1, len(oil_rates) + 1))

# Calculate water cut for each month
water_cut = [w / (o + w) * 100 for o, w in zip(oil_rates, water_rates)]

fig, ax1 = plt.subplots(figsize=(9, 5))

# Rates on left axis
ax1.plot(months, oil_rates, color="#2E8B57", linewidth=2, marker='o',
         markersize=4, label="Oil Rate (bopd)")
ax1.plot(months, water_rates, color="#4682B4", linewidth=2, marker='s',
         markersize=4, label="Water Rate (bwpd)")
ax1.set_xlabel("Month", fontsize=11)
ax1.set_ylabel("Rate (bpd)", fontsize=11)
ax1.set_ylim(0, 3600)
ax1.legend(loc="upper left", fontsize=9)

# Water cut on right axis
ax2 = ax1.twinx()
ax2.fill_between(months, water_cut, alpha=0.15, color="#CC4444")
ax2.plot(months, water_cut, color="#CC4444", linewidth=2, linestyle="--",
         label="Water Cut (%)")
ax2.set_ylabel("Water Cut (%)", color="#CC4444", fontsize=11)
ax2.set_ylim(0, 100)
ax2.tick_params(axis='y', labelcolor="#CC4444")
ax2.legend(loc="upper right", fontsize=9)

ax1.set_title("Well OD-003 — Production History & Water Cut Trend",
              fontsize=13, fontweight='bold')
ax1.set_xticks(months)
ax1.grid(True, alpha=0.2)
fig.tight_layout()
plt.show()

This is a standard production surveillance plot — the kind that appears in every monthly production report. The crossing of the oil and water curves is a significant event that reservoir engineers watch closely. In this well, it happens around month 10.

Tuples — Fixed Data That Must Not Change

A tuple is like a list, but it cannot be modified after creation. That sounds like a limitation, but it is actually a safety feature. Some data should never change: well coordinates, datum elevations, fundamental constants, and calibration values.

main.py

# Well surface location — latitude, longitude (WGS84)
# These are fixed physical coordinates. They must never be accidentally modified.
od001_location = (4.7731, 7.0085)   # Niger Delta, Nigeria
od002_location = (4.7742, 7.0091)
od003_location = (4.7718, 7.0078)

# Datum and reference points — fixed by survey
mean_sea_level_ft = 0.0
kelly_bushing_ft = 42.0
ground_elevation_ft = 28.0

# Casing specifications — (OD inches, weight lb/ft, grade)
# Once a casing is run, its specifications are permanent
surface_casing = (20.0, 94.0, "K-55")
production_casing = (9.625, 47.0, "L-80")

print(f"OD-001 Location:       {od001_location[0]:.4f}°N, {od001_location[1]:.4f}°E")
print(f"Surface Casing:        {surface_casing[0]}\" OD, {surface_casing[1]} lb/ft, {surface_casing[2]}")
print(f"Production Casing:     {production_casing[0]}\" OD, {production_casing[1]} lb/ft, {production_casing[2]}")

# Tuple unpacking — clean way to extract components
lat, lon = od001_location
print(f"\nUnpacked: Latitude = {lat}, Longitude = {lon}")

od, weight, grade = production_casing
print(f"Unpacked: {od}\" casing, {weight} lb/ft, grade {grade}")

main.py

# Demonstrating immutability — this is a safety feature

import traceback

try:
    od001_location[0] = 5.0  # Attempt to change latitude
except TypeError as e:
    print(f"Cannot modify tuple: {e}")
    print("This is intentional — well coordinates must not change accidentally.")

If you need a collection that cannot be accidentally modified during processing — coordinates, physical constants, calibration data — use a tuple. If you need a collection that grows, shrinks, or gets updated — production data, well lists, calculation results — use a list.

Dictionaries — Well Headers and Property Lookups

A dictionary stores key-value pairs. In petroleum engineering, this maps directly to the concept of a well header — a structured record where each field has a name and a value.

main.py

# A well header — the standard summary of a well's identity and configuration
well_header = {
    "well_name":        "Oso-Deep 003",
    "api_number":       "NG-OML58-003",
    "field":            "OML 58",
    "operator":         "National Petroleum Development Co.",
    "well_type":        "Horizontal Development",
    "status":           "Producing",
    "spud_date":        "2024-08-15",
    "completion_date":  "2025-03-22",
    "total_depth_ft":   12800,
    "tvd_ft":           9650,
    "lateral_length_ft": 4200,
    "surface_lat":      4.7718,
    "surface_lon":      7.0078,
    "target_formation": "E3000 Sand",
    "mud_weight_ppg":   11.4,
    "initial_rate_bopd": 3150,
}

# Access values by key
print(f"Well:     {well_header['well_name']}")
print(f"Status:   {well_header['status']}")
print(f"TD:       {well_header['total_depth_ft']:,} ft MD")
print(f"TVD:      {well_header['tvd_ft']:,} ft TVD")
print(f"Lateral:  {well_header['lateral_length_ft']:,} ft")
print(f"IP Rate:  {well_header['initial_rate_bopd']:,} bopd")

Nested Dictionaries — A Field Database

A real field contains multiple wells, each with its own header. A dictionary of dictionaries is a natural way to represent this.

main.py

# Field database — each well keyed by its identifier
field = {
    "OD-001": {
        "status": "Producing",
        "tvd_ft": 9800,
        "oil_rate_bopd": 1842,
        "water_cut_pct": 38,
        "cum_oil_mbbl": 320,
    },
    "OD-002": {
        "status": "Shut-in (workover)",
        "tvd_ft": 10200,
        "oil_rate_bopd": 0,
        "water_cut_pct": 0,
        "cum_oil_mbbl": 410,
    },
    "OD-003": {
        "status": "Producing",
        "tvd_ft": 9650,
        "oil_rate_bopd": 1155,
        "water_cut_pct": 56,
        "cum_oil_mbbl": 245,
    },
    "OD-005": {
        "status": "Producing",
        "tvd_ft": 9400,
        "oil_rate_bopd": 2210,
        "water_cut_pct": 22,
        "cum_oil_mbbl": 180,
    },
    "OD-007": {
        "status": "Producing",
        "tvd_ft": 9900,
        "oil_rate_bopd": 2950,
        "water_cut_pct": 8,
        "cum_oil_mbbl": 45,
    },
}

# Field-level summary
total_oil = sum(w["oil_rate_bopd"] for w in field.values())
total_cum = sum(w["cum_oil_mbbl"] for w in field.values())
producing = sum(1 for w in field.values() if w["status"] == "Producing")

print(f"Field Summary — OML 58")
print(f"  Total wells:        {len(field)}")
print(f"  Producing:          {producing}")
print(f"  Field oil rate:     {total_oil:,} bopd")
print(f"  Cumulative oil:     {total_cum:,} Mbbl")
print(f"  Avg rate/well:      {total_oil // producing:,} bopd")

main.py

import matplotlib.pyplot as plt

wells = [w for w in field.keys()]
rates = [field[w]["oil_rate_bopd"] for w in wells]
wcuts = [field[w]["water_cut_pct"] for w in wells]

fig, ax = plt.subplots(figsize=(8, 4.5))

colors = ["#2E8B57" if field[w]["status"] == "Producing" else "#999999" for w in wells]
bars = ax.bar(wells, rates, color=colors, edgecolor="white", linewidth=0.5)

# Annotate water cut on each bar
for bar, wc, rate in zip(bars, wcuts, rates):
    if rate > 0:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 50,
                f"WC: {wc}%", ha='center', fontsize=9, color="#666")

ax.set_ylabel("Oil Rate (bopd)", fontsize=11)
ax.set_title("OML 58 — Current Well Production", fontsize=13, fontweight='bold')
ax.grid(axis='y', alpha=0.2)

# Add shut-in annotation
ax.annotate("Shut-in\n(workover)", xy=(1, 0), fontsize=8, ha='center',
            va='bottom', color="#999", style='italic')

fig.tight_layout()
plt.show()

Sets — Unique Identifiers and Field Comparisons

A set is an unordered collection of unique values. In petroleum data, sets are useful for answering questions like: which wells are in both datasets? Which wells are missing from the production database? Which formations appear across multiple fields?

main.py

# Wells in the drilling database vs. wells in the production database
# These often do not match — wells drilled but not yet completed,
# or legacy wells that produce but have lost their drilling records.

drilling_db = {"OD-001", "OD-002", "OD-003", "OD-004", "OD-005", "OD-006", "OD-007"}
production_db = {"OD-001", "OD-002", "OD-003", "OD-005", "OD-007", "OD-008"}

# Set operations answer real questions
in_both = drilling_db & production_db
drilled_not_producing = drilling_db - production_db
producing_no_drill_record = production_db - drilling_db
all_wells = drilling_db | production_db

print(f"In drilling DB:             {sorted(drilling_db)}")
print(f"In production DB:           {sorted(production_db)}")
print()
print(f"In both databases:          {sorted(in_both)}")
print(f"Drilled but not producing:  {sorted(drilled_not_producing)}")
print(f"Producing, no drill record: {sorted(producing_no_drill_record)}")
print(f"All known wells:            {sorted(all_wells)}")

OD-004 and OD-006 were drilled but are not in the production database — they might be abandoned, waiting on completion, or simply missing from the records. OD-008 is producing but has no drilling record — a data quality issue that needs investigation. These are exactly the kinds of discrepancies that a petroleum data engineer encounters daily, and sets make them trivial to find.

File I/O — The Three Formats of Petroleum Data

CSV — Production Data

Comma-separated values files are the most common way production data is exported from SCADA systems, databases, and government regulatory agencies.

main.py

import csv

# Write sample production data to CSV
production_data = [
    ["Well", "Date", "Oil_bopd", "Water_bwpd", "Gas_mscfd", "Choke_64ths", "FWHP_psi"],
    ["OD-001", "2026-01-01", 1842, 1200, 4210, 32, 680],
    ["OD-001", "2026-02-01", 1790, 1280, 4050, 32, 655],
    ["OD-001", "2026-03-01", 1735, 1350, 3920, 32, 632],
    ["OD-003", "2026-01-01", 1395, 1095, 3180, 24, 520],
    ["OD-003", "2026-02-01", 1305, 1175, 3010, 24, 498],
    ["OD-003", "2026-03-01", 1225, 1250, 2870, 24, 475],
    ["OD-005", "2026-01-01", 2210, 620, 5100, 28, 780],
    ["OD-005", "2026-02-01", 2150, 680, 4980, 28, 762],
    ["OD-005", "2026-03-01", 2085, 745, 4840, 28, 741],
]

with open("field_production.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(production_data)

print("Written: field_production.csv")
print(f"  {len(production_data) - 1} records, {len(production_data[0])} columns")

main.py

# Read it back and process
with open("field_production.csv", "r") as f:
    reader = csv.DictReader(f)
    records = list(reader)

# Summarize by well
from collections import defaultdict

well_totals = defaultdict(lambda: {"oil": 0, "water": 0, "gas": 0, "months": 0})

for r in records:
    well = r["Well"]
    well_totals[well]["oil"] += int(r["Oil_bopd"])
    well_totals[well]["water"] += int(r["Water_bwpd"])
    well_totals[well]["gas"] += int(r["Gas_mscfd"])
    well_totals[well]["months"] += 1

print(f"{'Well':<10} {'Avg Oil':>10} {'Avg Water':>10} {'Avg GOR':>10} {'Months':>7}")
print("-" * 50)
for well, data in sorted(well_totals.items()):
    avg_oil = data["oil"] / data["months"]
    avg_water = data["water"] / data["months"]
    avg_gor = (data["gas"] * 1000) / data["oil"] if data["oil"] > 0 else 0
    print(f"{well:<10} {avg_oil:>10,.0f} {avg_water:>10,.0f} {avg_gor:>10,.0f} {data['months']:>7}")

JSON — Well Configuration and Metadata

JSON (JavaScript Object Notation) is used for structured configuration data, API responses, and metadata. It maps directly to Python dictionaries.

main.py

import json

# Write a well completion record as JSON
completion = {
    "well_name": "Oso-Deep 003",
    "completion_date": "2025-03-22",
    "completion_type": "Multi-stage frac",
    "stages": 12,
    "proppant_type": "100-mesh sand",
    "proppant_volume_lbs": 4_200_000,
    "frac_fluid": "Slickwater",
    "perforation_intervals": [
        {"top_ft": 9450, "bottom_ft": 9520, "shots_per_ft": 6},
        {"top_ft": 9580, "bottom_ft": 9640, "shots_per_ft": 6},
        {"top_ft": 9700, "bottom_ft": 9780, "shots_per_ft": 6},
    ],
    "initial_shut_in_pressure_psi": 4850,
    "initial_rate_bopd": 3150,
}

# Write to file
with open("od003_completion.json", "w") as f:
    json.dump(completion, f, indent=2)

# Read back
with open("od003_completion.json", "r") as f:
    comp = json.load(f)

print(f"Well: {comp['well_name']}")
print(f"Completion: {comp['completion_type']} — {comp['stages']} stages")
print(f"Proppant: {comp['proppant_volume_lbs']:,} lbs of {comp['proppant_type']}")
print(f"\nPerforation Intervals:")
for i, perf in enumerate(comp["perforation_intervals"], 1):
    thickness = perf["bottom_ft"] - perf["top_ft"]
    print(f"  Zone {i}: {perf['top_ft']:,}–{perf['bottom_ft']:,} ft "
          f"({thickness} ft, {perf['shots_per_ft']} spf)")

LAS — Well Log Data

LAS (Log ASCII Standard) is the petroleum industry's standard format for well log data. Every well log you will ever work with — gamma ray, resistivity, density, neutron, sonic — is stored or exchanged as a LAS file. The code below creates one from scratch so you can see its structure — header sections (~VERSION, ~WELL, ~CURVE) followed by columnar ASCII data.

main.py

# Create a synthetic LAS file for demonstration
# In practice, LAS files come from wireline logging companies (Schlumberger, Halliburton, Baker Hughes)

import numpy as np

# Generate synthetic log data for 500 ft of section
np.random.seed(42)
n_points = 500
depths = np.linspace(9000, 9500, n_points)

# Gamma Ray — low in sand (reservoir), high in shale (non-reservoir)
# Simulating a sand-shale sequence
gr_base = np.where((depths > 9150) & (depths < 9350), 35, 110)  # Sand between 9150-9350
gr = gr_base + np.random.normal(0, 8, n_points)
gr = np.clip(gr, 0, 150)

# Resistivity — high in hydrocarbon-bearing sand, low in shale and water zones
rt_base = np.where((depths > 9150) & (depths < 9350), 45, 3)
rt = rt_base * np.exp(np.random.normal(0, 0.2, n_points))
rt = np.clip(rt, 0.5, 200)

# Density — lower in porous sand (~2.35 g/cc), higher in shale (~2.55 g/cc)
rhob_base = np.where((depths > 9150) & (depths < 9350), 2.35, 2.55)
rhob = rhob_base + np.random.normal(0, 0.03, n_points)

# Neutron porosity — higher in shale (bound water), moderate in sand
nphi_base = np.where((depths > 9150) & (depths < 9350), 0.18, 0.32)
nphi = nphi_base + np.random.normal(0, 0.02, n_points)

# Write LAS format
las_content = """~VERSION INFORMATION
VERS.                 2.0 : CWLS LOG ASCII STANDARD - VERSION 2.0
WRAP.                  NO : ONE LINE PER DEPTH STEP

~WELL INFORMATION
STRT.FT           9000.00 : START DEPTH
STOP.FT           9500.00 : STOP DEPTH
STEP.FT              1.00 : STEP
NULL.              -999.25 : NULL VALUE
COMP.   National Petroleum Development Co. : COMPANY
WELL.   Oso-Deep 003      : WELL NAME
FLD.    OML 58             : FIELD
LOC.    Niger Delta, Nigeria : LOCATION

~CURVE INFORMATION
DEPT.FT                   : 1  DEPTH
GR.GAPI                   : 2  GAMMA RAY
RT.OHMM                   : 3  DEEP RESISTIVITY
RHOB.G/C3                 : 4  BULK DENSITY
NPHI.V/V                  : 5  NEUTRON POROSITY

~A  DEPTH        GR       RT     RHOB     NPHI
"""

for i in range(n_points):
    las_content += f"  {depths[i]:10.2f} {gr[i]:8.2f} {rt[i]:8.2f} {rhob[i]:8.4f} {nphi[i]:8.4f}\n"

with open("od003_logs.las", "w") as f:
    f.write(las_content)

print("Written: od003_logs.las")
print(f"  Depth range: {depths[0]:.0f} – {depths[-1]:.0f} ft")
print(f"  Curves: GR, RT, RHOB, NPHI")
print(f"  Data points: {n_points}")

main.py

import numpy as np
import matplotlib.pyplot as plt

# Read the LAS file we just created
# (In practice, you would use the 'lasio' library: las = lasio.read("od003_logs.las"))
# Here we parse manually to show the underlying structure.

data_lines = []
in_data = False

with open("od003_logs.las", "r") as f:
    for line in f:
        if line.startswith("~A"):
            in_data = True
            continue
        if in_data and line.strip():
            values = line.split()
            if len(values) == 5:
                data_lines.append([float(v) for v in values])

log_data = np.array(data_lines)
depth = log_data[:, 0]
gr    = log_data[:, 1]
rt    = log_data[:, 2]
rhob  = log_data[:, 3]
nphi  = log_data[:, 4]

# === Triple-Combo Log Display ===
fig, axes = plt.subplots(1, 3, figsize=(10, 12), sharey=True)

# Track 1: Gamma Ray
ax1 = axes[0]
ax1.plot(gr, depth, color="#2E8B57", linewidth=0.8)
ax1.set_xlabel("GR (gAPI)", fontsize=10)
ax1.set_ylabel("Depth (ft)", fontsize=11)
ax1.set_xlim(0, 150)
ax1.set_title("Gamma Ray", fontsize=11, fontweight='bold')
ax1.axvline(x=60, color='gray', linestyle='--', linewidth=0.5, alpha=0.5)  # Sand/shale cutoff
ax1.fill_betweenx(depth, 0, gr, where=(gr < 60), alpha=0.15, color="#FFD700")  # Highlight sand
ax1.invert_yaxis()
ax1.grid(True, alpha=0.15)

# Track 2: Resistivity (log scale)
ax2 = axes[1]
ax2.semilogx(rt, depth, color="#CC4444", linewidth=0.8)
ax2.set_xlabel("RT (ohm·m)", fontsize=10)
ax2.set_xlim(0.1, 1000)
ax2.set_title("Deep Resistivity", fontsize=11, fontweight='bold')
ax2.grid(True, alpha=0.15, which='both')

# Track 3: Density-Neutron
ax3 = axes[2]
ax3.plot(rhob, depth, color="#4682B4", linewidth=0.8, label="RHOB (g/cc)")
ax3_nphi = ax3.twiny()
ax3_nphi.plot(nphi, depth, color="#CC4444", linewidth=0.8, linestyle="--", label="NPHI (v/v)")
ax3.set_xlabel("RHOB (g/cc)", fontsize=10, color="#4682B4")
ax3_nphi.set_xlabel("NPHI (v/v)", fontsize=10, color="#CC4444")
ax3.set_xlim(2.0, 2.8)
ax3_nphi.set_xlim(0.45, -0.05)  # Reversed scale (petroleum convention)
ax3.set_title("Density / Neutron", fontsize=11, fontweight='bold')
ax3.grid(True, alpha=0.15)

# Highlight the pay zone
for ax in axes:
    ax.axhspan(9150, 9350, alpha=0.06, color="#FFD700")

fig.suptitle("Oso-Deep 003 — Triple Combo Log", fontsize=14, fontweight='bold', y=1.01)
fig.tight_layout()
plt.show()

This is a triple-combo log — the most fundamental well log display in petroleum engineering. Every petrophysicist reads this display daily. The three tracks together tell you whether a rock interval contains hydrocarbons, how porous it is, and how thick the pay zone is. We will compute quantitative answers from these curves in Chapter 7 (Well Log Analysis).

List and Dictionary Comprehensions

Comprehensions are a concise way to create lists and dictionaries from existing data. They replace multi-line loops with a single readable expression.

main.py

# List comprehension — calculate water cut for each month
oil = [3150, 2780, 2460, 2190, 1965, 1780, 1625, 1500, 1395, 1305, 1225, 1155]
water = [420, 480, 560, 650, 740, 830, 920, 1010, 1095, 1175, 1250, 1320]

water_cut_pct = [w / (o + w) * 100 for o, w in zip(oil, water)]
print("Water cut by month:")
for i, wc in enumerate(water_cut_pct, 1):
    print(f"  Month {i:>2}: {wc:5.1f}%")

# Dictionary comprehension — build a lookup of well depths
well_depths = {
    name: data["tvd_ft"]
    for name, data in field.items()
    if data["status"] == "Producing"
}
print(f"\nProducing well depths: {well_depths}")

# Filtered list — wells with water cut above 40%
high_watercut_wells = [
    name for name, data in field.items()
    if data["water_cut_pct"] > 40
]
print(f"Wells with WC > 40%: {high_watercut_wells}")

# Conditional transformation — convert rates to SI units only for active wells
rates_m3pd = {
    name: round(data["oil_rate_bopd"] * 0.158987, 1)
    for name, data in field.items()
    if data["oil_rate_bopd"] > 0
}
print(f"Rates in m³/d: {rates_m3pd}")

Comprehensions are not just shorter — they are more readable when the transformation is simple. If the logic inside the comprehension becomes complex (multiple conditions, nested calculations), use a regular loop with a function call instead. Readability always wins.

Summary

This chapter covered the data structures that underpin all petroleum data processing:

Lists store ordered sequences — production histories, depth arrays, casing programs. They are mutable and indexed from zero.
Tuples store fixed data — well coordinates, casing specifications, reference constants. Their immutability is a safety feature, not a limitation.
Dictionaries store key-value pairs — well headers, completion records, field databases. Nested dictionaries model the hierarchical structure of petroleum data naturally.
Sets store unique values — well inventories, formation lists, data reconciliation. Set operations (intersection, difference, union) answer data quality questions instantly.
CSV files hold tabular production and engineering data. Python's csv module reads and writes them with minimal code.
JSON files hold structured metadata and configuration. They map directly to Python dictionaries.
LAS files hold well log data — the petroleum industry's standard since 1989. The columnar format contains depth-indexed measurements from wireline and LWD tools.
Comprehensions provide concise, readable data transformations for lists and dictionaries.

In the next chapter, we move from Python's built-in structures to the libraries that make petroleum data science practical: NumPy for numerical computation and Pandas for tabular data analysis at scale.

Exercises

fitness_center

Exercise 3.1Practice

Casing String Inventory

A well's casing program consists of the following strings: CasingOD (in)Weight (lb/ft)GradeSetting Depth (ft)Conductor36192X-52500Surface2094K-552,800...

arrow_forward