Pages

Saturday, June 4, 2011

Origins of DataStage

Lee presented the concept to VMark executives on June 6, 1996 along with a detailed requirements   specification and a high-level architecture, and it was immediately approved. He worked with 5 key  developers in the Milton Keynes, UK office of VMark, Len Greenwood, Neville Myatt, Peter Williams, Ewan Paton, with Ian Thody managing. Early alpha versions were shown to a variety of customers in October and November that year. DataStage was announced on November 18 of 1996 and first presented to the public in a demonstration at a DB Expo in December 1996. The first formal beta version was shipped in November, and the first GA version was shipped to the first paying customer, Eurotunnel, in January 1997. VMark started shipping DataStage on January 20 of 1997.

ETL solution : It supports the collection, integration and transformation of large volumes of data where the data structures could range from simple to highly complex. Once can process real-time data or data received on a periodic or scheduled basis. 

Scalable : High-performance processing of massive large scale data volumes can be done by leveraging the parallel processing capabilities of multiprocessor hardware platforms. 

Hetergenous Data Support Various data sources such as text files, complex data structures in XML, ERP systems (SAP and PeopleSoft) and almost any database (including partitioned databases), web services and business intelligence tools like SAS are supported.

Real-time data integration support: IBM WebSphere Information Services Director provides a service-oriented architecture (SOA) for publishing data integration logic as shared services which can be reused across the organization. These services supports high-speed, high volume data and are highly reliability for transactional processing / batch processing. 

Connectivity across data source : It connects across a wider range of data sources and applications and thus used by the most popular enterprise application such as SAP, Siebel, Oracle, and PeopleSoft.

What Is DataStage?

DataStage is sold to and installed in an organization and its IT support staff are expected to maintain it and to solve DataStage users' problems. In some cases IT support is outsourced and may not become aware of DataStage until it has been installed. DataStage is actually two separate things.  In production (and, of course, in development and test environments) it is just another application on the server, an application which connects to data sources and targets and processes ("transforms") the data as they move through the application. Therefore DataStage is classed as an "ETL tool", the initials standing for extract, transform and load respectively. DataStage "jobs", as they are known, can execute on a single server or on multiple machines in a cluster or grid environment. Like all applications, DataStage jobs consume resources: CPU, memory, disk space, I/O bandwidth and network bandwidth.  DataStage also has a set of Windows-based graphical tools that allow ETL processes to be designed, the metadata associated with them managed, and the ETL processes monitored. These client tools connect to the DataStage server because all of the design information and metadata are stored on the server.

On the DataStage server, work is organized into one or more "projects". There are also two DataStage engines, the "server engine" and the "parallel engine". The server engine is located in a directory called DSEngine whose
location is recorded in a hidden file called /.dshome  The parallel engine is located in a sibling directory called
PXEngine whose location is recorded in the environment variable APT_ORCHHOME and/or in the environment variable PXHOME.

DataStage Engines

The server engine is the original DataStage engine and, as its name suggests, is restricted to running jobs on the server.
The parallel engine results from acquisition of Orchestrate, a parallel execution technology
This technology enables work (and data) to be distributed over multiple logical "processing nodes"