Part I: Python Fundamentals
Chapter 3
Data Structures for Petroleum Data
Petroleum engineering generates enormous volumes of data across every phase of a well's life. Drilling reports, well logs, production records, fluid analyses, pressure tests, seismic surveys — each arrives in a different format, at a different frequency, with a different structure. An engineer who cannot organize, access, and manipulate this data efficiently is an engineer who spends most of their time fighting spreadsheets instead of solving problems.
Python provides four core data structures — lists, tuples, dictionaries, and sets — each suited to a different kind of petroleum data. This chapter teaches all four through real engineering use cases, then moves to the file formats you will encounter in every petroleum engineering workflow: CSV, JSON, and LAS.
infoWhat You Will Learn
- Lists — ordered, changeable collections for production histories, depth arrays, and time series
- Tuples — fixed collections for coordinates, well locations, and parameter sets that must not change
- Dictionaries — key-value lookups for well headers, fluid properties, and configuration data
- Sets — unique collections for well inventories, field comparisons, and deduplication
- File I/O — reading and writing CSV, JSON, and LAS files, the three formats that dominate petroleum data exchange
- List and dictionary comprehensions — concise, readable data transformations
Lists — Production Histories and Depth Arrays
A list is an ordered, changeable sequence of values. In petroleum engineering, lists naturally represent anything that comes in a sequence: monthly production rates, depth intervals, casing diameters, or a series of pressure readings.
Indexing and Slicing
Lists are zero-indexed — the first element is at position 0. This is a universal convention in programming, and while it takes some adjustment if you are coming from Excel (where row 1 is the first row), it becomes natural quickly.
Modifying Lists — Adding and Updating Data
Production data grows over time. New months arrive, wells come online, records get corrected. Lists are mutable — you can add, remove, and change elements.
Water Cut Trend — Lists + Visualization
Water cut — the fraction of produced liquid that is water — is one of the most important surveillance metrics in production engineering. A rising water cut means the reservoir's water is encroaching on the oil zone. Here is how to compute and visualize it from two lists.
This is a standard production surveillance plot — the kind that appears in every monthly production report. The crossing of the oil and water curves is a significant event that reservoir engineers watch closely. In this well, it happens around month 10.
Tuples — Fixed Data That Must Not Change
A tuple is like a list, but it cannot be modified after creation. That sounds like a limitation, but it is actually a safety feature. Some data should never change: well coordinates, datum elevations, fundamental constants, and calibration values.
If you need a collection that cannot be accidentally modified during processing — coordinates, physical constants, calibration data — use a tuple. If you need a collection that grows, shrinks, or gets updated — production data, well lists, calculation results — use a list.
Dictionaries — Well Headers and Property Lookups
A dictionary stores key-value pairs. In petroleum engineering, this maps directly to the concept of a well header — a structured record where each field has a name and a value.
Nested Dictionaries — A Field Database
A real field contains multiple wells, each with its own header. A dictionary of dictionaries is a natural way to represent this.
Sets — Unique Identifiers and Field Comparisons
A set is an unordered collection of unique values. In petroleum data, sets are useful for answering questions like: which wells are in both datasets? Which wells are missing from the production database? Which formations appear across multiple fields?
OD-004 and OD-006 were drilled but are not in the production database — they might be abandoned, waiting on completion, or simply missing from the records. OD-008 is producing but has no drilling record — a data quality issue that needs investigation. These are exactly the kinds of discrepancies that a petroleum data engineer encounters daily, and sets make them trivial to find.
File I/O — The Three Formats of Petroleum Data
CSV — Production Data
Comma-separated values files are the most common way production data is exported from SCADA systems, databases, and government regulatory agencies.
JSON — Well Configuration and Metadata
JSON (JavaScript Object Notation) is used for structured configuration data, API responses, and metadata. It maps directly to Python dictionaries.
LAS — Well Log Data
LAS (Log ASCII Standard) is the petroleum industry's standard format for well log data. Every well log you will ever work with — gamma ray, resistivity, density, neutron, sonic — is stored or exchanged as a LAS file. The code below creates one from scratch so you can see its structure — header sections (~VERSION, ~WELL, ~CURVE) followed by columnar ASCII data.
This is a triple-combo log — the most fundamental well log display in petroleum engineering. Every petrophysicist reads this display daily. The three tracks together tell you whether a rock interval contains hydrocarbons, how porous it is, and how thick the pay zone is. We will compute quantitative answers from these curves in Chapter 7 (Well Log Analysis).
List and Dictionary Comprehensions
Comprehensions are a concise way to create lists and dictionaries from existing data. They replace multi-line loops with a single readable expression.
Comprehensions are not just shorter — they are more readable when the transformation is simple. If the logic inside the comprehension becomes complex (multiple conditions, nested calculations), use a regular loop with a function call instead. Readability always wins.
Summary
This chapter covered the data structures that underpin all petroleum data processing:
- Lists store ordered sequences — production histories, depth arrays, casing programs. They are mutable and indexed from zero.
- Tuples store fixed data — well coordinates, casing specifications, reference constants. Their immutability is a safety feature, not a limitation.
- Dictionaries store key-value pairs — well headers, completion records, field databases. Nested dictionaries model the hierarchical structure of petroleum data naturally.
- Sets store unique values — well inventories, formation lists, data reconciliation. Set operations (intersection, difference, union) answer data quality questions instantly.
- CSV files hold tabular production and engineering data. Python's
csvmodule reads and writes them with minimal code. - JSON files hold structured metadata and configuration. They map directly to Python dictionaries.
- LAS files hold well log data — the petroleum industry's standard since 1989. The columnar format contains depth-indexed measurements from wireline and LWD tools.
- Comprehensions provide concise, readable data transformations for lists and dictionaries.
In the next chapter, we move from Python's built-in structures to the libraries that make petroleum data science practical: NumPy for numerical computation and Pandas for tabular data analysis at scale.
Exercises
Casing String Inventory
A well's casing program consists of the following strings: CasingOD (in)Weight (lb/ft)GradeSetting Depth (ft)Conductor36192X-52500Surface2094K-552,800...
Well Header Builder
Write a function build_well_header() that accepts keyword arguments for well name, field, operator, total depth, TVD, well type, spud date, and status...
Production Data Reconciliation
You receive two CSV files from different departments. One contains drilling data for wells: OD-001 through OD-010. The other contains production data ...
JSON Well Completion Report
Create a JSON file representing a multi-zone completion with at least three perforation intervals. Each interval should include top depth, bottom dept...
LAS File Explorer
Write a function las_summary(filepath) that reads a LAS file and prints: (a) the well name and field from the header, (b) the depth range and step siz...
Water Cut Alarm System
Using the production data from this chapter, write a program that monitors water cut trends. For each well, calculate the water cut for each month and...
Drilling Fluid Inventory
A drilling operation maintains an inventory of different mud types and additives stored as a dictionary of dictionaries: ``python inventory = { Barite...
Multi-Well Production Comparison
Create a dictionary containing 12 months of production data for four wells. For each well, store monthly oil rate, water rate, and gas rate. Write a p...
Formation Tops Database
Geologists maintain a database of formation tops — the depth at which each geological formation is encountered in each well. Create a dictionary struc...
Data Format Converter
Write a program that reads the field_production.csv file created in this chapter and converts it to: (a) a JSON file grouped by well (each well has an...