Pages

Saturday, June 4, 2011

What Is DataStage?

DataStage is sold to and installed in an organization and its IT support staff are expected to maintain it and to solve DataStage users' problems. In some cases IT support is outsourced and may not become aware of DataStage until it has been installed. DataStage is actually two separate things.  In production (and, of course, in development and test environments) it is just another application on the server, an application which connects to data sources and targets and processes ("transforms") the data as they move through the application. Therefore DataStage is classed as an "ETL tool", the initials standing for extract, transform and load respectively. DataStage "jobs", as they are known, can execute on a single server or on multiple machines in a cluster or grid environment. Like all applications, DataStage jobs consume resources: CPU, memory, disk space, I/O bandwidth and network bandwidth.  DataStage also has a set of Windows-based graphical tools that allow ETL processes to be designed, the metadata associated with them managed, and the ETL processes monitored. These client tools connect to the DataStage server because all of the design information and metadata are stored on the server.

On the DataStage server, work is organized into one or more "projects". There are also two DataStage engines, the "server engine" and the "parallel engine". The server engine is located in a directory called DSEngine whose
location is recorded in a hidden file called /.dshome  The parallel engine is located in a sibling directory called
PXEngine whose location is recorded in the environment variable APT_ORCHHOME and/or in the environment variable PXHOME.

DataStage Engines

The server engine is the original DataStage engine and, as its name suggests, is restricted to running jobs on the server.
The parallel engine results from acquisition of Orchestrate, a parallel execution technology
This technology enables work (and data) to be distributed over multiple logical "processing nodes"

No comments:

Post a Comment