PYTHON AUTOMATION · DATA CONSOLIDATION

Multi-Source Data Consolidation for Debris Programs

How Watershed GeoData builds automated data aggregation pipelines that replace manual compilation with scripted runs, pulling from dozens of REST endpoints across multiple jurisdictions into a single unified feature service.

SERVICE TYPE
System Design & Development

DOMAIN
Debris Operations Data Management

PLATFORM
Python · ArcGIS API · REST

DEPLOYMENT
Embedded with program operations


The Problem

Dozens of data silos, zero operational visibility

On large-scale hurricane debris programs, monitoring firms typically maintain separate data layers for each project delineation across their ArcGIS portals. Debris removal progress data ends up spread across dozens of different feature services, often with inconsistent field naming conventions between jurisdictions.

For program leadership to get a complete operational picture of cleanup progress across all affected jurisdictions, data has to be manually downloaded, reformatted, and combined. That process can consume an entire workday and is highly error-prone. Resource allocation decisions end up being made on stale data.

Our Approach

One script, one command, one truth

We build Python scripts that authenticate via OAuth through ArcGIS Pro, query all source endpoints, download current data, translate inconsistent field schemas to a standardized format, and perform jurisdiction-level truncation-and-reload operations against a master feature service. The output can include both point-based work sites and linear features like waterway reaches.

The scripts include built-in rate limiting to avoid overwhelming external servers, comprehensive progress logging for operator visibility, and error handling that allows partial failures without corrupting the master dataset. Configuration-driven design means adding a new jurisdiction or source requires only a dictionary entry, not code changes.


How It Works

Six-step automated pipeline

1. Authenticate via ArcGIS Pro Portal
Uses active ArcGIS Pro session credentials via OAuth. No hardcoded credentials stored in the script.

2. Iterate through source configurations
Reads a structured dictionary of jurisdictions, each with source endpoint URLs and field mapping definitions.

3. Download data from external endpoints
Queries each source feature service for current records. Handles pagination and automatic rate-limit backoff for server protection.

4. Standardize field schemas
Applies source-specific field mappings to translate field names to the standardized master schema.

5. Truncate and reload by jurisdiction
Deletes existing records for the current jurisdiction, inserts freshly standardized data. Prevents duplicates while preserving other jurisdictions’ records.

6. Repeat for all sources
Steps 2 through 5 repeat for each jurisdiction. On completion, the master feature service contains a unified, current picture of all operations.


Technical Architecture

Language: Python, executed via ArcGIS Pro’s built-in Python environment. No external dependencies beyond the ArcGIS API for Python.

Authentication: OAuth 2.0 via active ArcGIS Pro portal session. Token refresh handled automatically.

Data Sources: REST endpoints hosted across external ArcGIS portals, each representing a specific jurisdiction, basin, or feature type view.

Output: Multi-layer feature service supporting both point geometry (work sites) and polyline geometry (linear features like waterway reaches under active operations).

Error Handling: Per-jurisdiction isolation means a failure in one source does not corrupt data from others. Rate limiting with automatic backoff prevents server overload.

Extensibility: Configuration-driven design. Adding a new jurisdiction or source requires only adding an entry to the source dictionary. No code changes needed.

Python · ArcGIS API for Python · OAuth 2.0 · REST API · Field Schema Mapping · Truncate-and-Reload