Major Asset Manager Exits Informatica PowerCenter 10.x, Migrates 1,800 Mappings to Databricks in 9 Months

MigryX Case Study • April 2026 • Financial Services & Asset Management

Executive Summary

A major financial services and asset management firm managing hundreds of billions in assets under management faced a forcing function: Informatica's announcement of PowerCenter 10.x end-of-standard-support, combined with a multi-year cloud transformation program centered on AWS and Databricks, created an urgent need to migrate its 1,800 PowerCenter mappings and associated workflows. The firm's data pipelines underpin fund NAV calculation, portfolio risk analytics, regulatory capital reporting (Basel III, CCAR), and investor reporting for institutional and retail clients across multiple countries. Over 9 months, MigryX parsed all PowerCenter XML exports, converted every mapping to production PySpark, reconstructed workflow execution logic in Databricks Workflows, and delivered a fully governed estate in Databricks Unity Catalog. The program produced 900,000 lines of PySpark, performance improvements of 4–9X on critical end-of-day NAV and position reconciliation pipelines, and a projected $4.3 million in two-year savings from eliminated Informatica licensing and infrastructure.

Client Overview

The client is a diversified asset management firm with operating entities across North America, Europe, and Asia-Pacific, offering equity, fixed income, multi-asset, and alternative investment strategies to institutional clients including sovereign wealth funds, pension plans, endowments, and foundations, as well as a retail mutual fund and ETF platform. Their data engineering function is responsible for delivering clean, reconciled, and auditable data to fund accounting, risk management, portfolio management, regulatory reporting, and investor relations systems — all subject to strict SLA requirements with zero tolerance for late or incorrect delivery.

The PowerCenter estate had been the firm's primary ETL platform since 2009, consolidated from three legacy ETL platforms following a major acquisition. It had accumulated 1,800 mappings organized across 12 PowerCenter folders representing distinct business domains: trade data, position management, corporate actions, pricing, fund accounting, client reporting, reference data, regulatory capital, compliance monitoring, tax reporting, operations, and risk analytics. The platform ran on a dedicated Linux infrastructure stack with an Oracle-based PowerCenter Repository Service, scheduled and monitored through PowerCenter Workflow Manager.

Business Challenge

The end-of-support timeline created a hard deadline for the migration program, but the technical complexity of the estate had initially led internal estimates to project a 24-36 month timeline — far too long given the support window. Key challenges included:

The MigryX Approach

MigryX structured the engagement around PowerCenter's native XML export format, which encodes the full repository object graph — sources, targets, transformations, mappings, mapplets, sessions, workflows, and worklets — in a single exportable artifact. The MigryX PowerCenter parser processes these exports to reconstruct the complete logical model of each mapping and its associated session and workflow execution context, enabling conversion fidelity that extends beyond the transformation logic to encompass the full runtime configuration.

The SCD Type 2 challenge was addressed through MigryX's semantic SCD library, which recognizes the PowerCenter SCD Wizard pattern and its common variants and converts them to Delta Lake merge operations using the MERGE INTO syntax with explicit effective date management. Delta Lake's native support for ACID transactions made it an ideal target for SCD Type 2 logic, as the merge operations execute atomically — eliminating the risk of partial updates that had occasionally caused reconciliation issues in the PowerCenter environment during network interruptions. The resulting Delta Lake tables also provided time travel capability, enabling point-in-time queries against dimension history that were previously only possible through complex historical snapshots maintained as separate tables.

Session-level configurations were parsed from the workflow XML and mapped to Databricks job cluster configurations and Databricks Workflow task-level settings. Partition counts influenced Spark executor configurations; commit intervals were replaced with Delta Lake checkpoint configurations; connection pool settings were mapped to JDBC connection properties in Databricks Secrets-backed connection objects. This session configuration fidelity ensured that the migrated jobs exhibited equivalent I/O behavior and resource utilization profiles to their PowerCenter predecessors — a critical requirement for production capacity planning validation.

Reusable transformations and mapplets were converted to Python function libraries and PySpark transformation modules that could be imported across multiple pipeline scripts, preserving the reuse architecture of the PowerCenter design. This produced a maintainable, DRY (Don't Repeat Yourself) codebase where shared logic changes could be applied centrally — actually improving on the PowerCenter model by enabling version-controlled, unit-testable shared functions. The entire migrated estate was organized in a modular Python package structure with a clear domain hierarchy mirroring the original PowerCenter folder organization, deployed to Databricks via Azure DevOps CI/CD pipelines with automated test execution on every merge.

PowerCenter Mapping Component Mapping Reference

PowerCenter Component PySpark / Databricks Equivalent Conversion Notes
Source Qualifier (relational) spark.read.jdbc() with pushdown SQL SQL override and filter conditions preserved exactly
Expression Transformation withColumn() / select() with PySpark expressions 94 Informatica built-in functions mapped with null-handling verified
Aggregator Transformation groupBy().agg() Group By ports mapped to groupBy keys; aggregate ports to agg functions
Joiner Transformation DataFrame.join() All join types (normal, master outer, detail outer, full outer) supported
Lookup Transformation Broadcast join or Delta Lake lookup Connected vs. Unconnected lookup semantics preserved; caching mapped to broadcast
Router Transformation DataFrame.filter() per output group Multiple output groups emitted as separate filtered DataFrames
Update Strategy Transformation Delta Lake MERGE INTO with DD_INSERT/DD_UPDATE/DD_DELETE flags Update strategy expression converted to merge condition predicates
SCD Wizard (Type 2) Delta Lake MERGE INTO with effective/expiry date columns Time travel queries replace historical snapshot pattern
Reusable Transformation Python module function (importable) Shared logic centralized in versioned Python package
Mapplet Python function or PySpark pipeline function Input/output groups mapped to function parameters and return values
PowerCenter Workflow Databricks Workflow task graph Task dependencies, failure handling, and event triggers preserved
Session (with partition config) Databricks Job cluster config + task settings Partition count, commit interval, buffer size translated to Spark config

Key Migration Highlights

Security & Compliance

The client operates under an exceptionally demanding regulatory framework spanning SEC and FINRA requirements in the US, FCA rules in the UK, ESMA directives across the EU, SFC requirements in Hong Kong, and MAS requirements in Singapore. The migration program was subject to formal change management governance under the firm's Model Risk Management (MRM) policy for all pipelines with regulatory capital implications.

Results & Business Impact

The 9-month timeline significantly outperformed the client's internal estimate of 24–36 months for a manual rewrite approach. The accelerated timeline was directly attributable to MigryX's automated conversion coverage, which reduced the estimated engineering labor from 84,000 person-hours (the internal estimate for a manual rewrite) to 9,200 person-hours including MigryX-assisted engineering review, validation, and knowledge transfer. This 9X labor reduction translated directly into the compressed 9-month delivery timeline that allowed the client to avoid extended support fees entirely.

1,800
PowerCenter Mappings Migrated
900K
Lines of PySpark Generated
4–9X
Avg. Pipeline Performance Gain
$4.3M
Projected Savings Over 2 Years
9 Mo
End-to-End Migration Duration
24 min
NAV Batch (was 3.5 hrs)

The $4.3 million two-year savings projection is based on $2.8 million in eliminated Informatica PowerCenter licensing and repository infrastructure costs, $900K in reduced operational support labor (PowerCenter administration required a dedicated three-person team that has been redeployed to Databricks development), and $600K in avoided Oracle database costs for the PowerCenter Repository Service. The firm's CFO formally recognized the program as delivering ROI in excess of 4X the program investment within the two-year measurement horizon, making it the highest-returning infrastructure modernization initiative in the firm's data engineering history.

"We had been told that migrating an Informatica estate of our size and complexity would take two to three years minimum. MigryX delivered in nine months — and the quality of the output exceeded what we expected. The SCD Type 2 conversions to Delta Lake were particularly impressive: we now have time travel on all our dimension tables, which is something we'd wanted for years but could never justify rebuilding from scratch. Our fund accounting team had NAV figures two hours earlier on the first day of cutover. That was an immediate, visible win for the business."

— Managing Director, Data Engineering & Architecture, Major Asset Management Firm

Ready to Modernize Your Informatica PowerCenter Estate?

See how MigryX can accelerate your migration to Databricks with parser-driven automation. Full session-config fidelity. Delta Lake SCD. Automated validation.

Explore Databricks Migration →