Pages

Friday, March 18, 2011

ETL process

Implementing ETL process in Datastage to load the DataWarehouse

ETL process 

From an ETL definition the process involves the three tasks:
  • extract data from an operational source or archive systems which are the primary source of data for the data warehouse.
  • transform the data - which may involve cleaning, filtering and applying various business rules
  • load the data into a data warehouse or any other database or application that houses data

ETL process from a Datastage standpoint

In datastage the ETL execution flow is managed by controlling jobs, called Job Sequences. A master controlling job provides a single interface to pass parameter values down to controlled jobs and launch hundreds of jobs with desired parameters. Changing runtime options (like moving project from testing to production environment) is done in job sequences and does not require changing the 'child' jobs.
Controlled jobs can be run in parallel or in serial (when a second job is dependant on the first). In case of serial job execution it's very important to check if the preceding set of jobs was executed successfully.
A normal datastage ETL process can be broken up into the following segments (each of the segments can be realized by a set of datastage jobs):

No comments:

Post a Comment