IBM InfoSphere DataStage is an Extract, Transform and Load tool for providing transformation and movement of data from a wide range of sources and targets in real time, microbatch or batch mode. Some of the important features include:
- supports the rapid development of massively scalable information integration
- a rich Designer interface for drag and drop build of visual data flows.
specialised stages for ETL functions such as a wizard approach to Slowly Changing Dimensions, drag and drop reference lookups, aggregation, funnel, remove duplicates.
- hundreds of pre-built transformation rules across metadata conversion, mathematical and statistical functions, text parsing, array handling and logical functions.
- re-usable code with shared routines, shared containers, shared job parameters and parameter sets, shared metadata and database connectors.
- transformation jobs are run on a massively parallel ETL engine and also supports transformation on the database for balancing load onto the processing engine that is most effective and efficient.
Creating a DataStage Environment
Focus offers solution architecture services for defining the future state architecture of a DataStage environment.
- Environments: Development, testing and production environments – shared or separated.
- Metadata Architecture: how metadata is imported into DataStage and how it is shared with other applications. How to use Metadata Workbench for a more effective DataStage development team.
- Security: internal security, operating system security or third party security such as LDAP.
- Hardware selecting and sizing.
Focus offers installation services for DataStage on Unix, Linux, Windows and Linux/z for a MPP, SMP, cluster or grid environment.
Getting Started with DataStage
Focus offers training, mentoring and consulting services for DataStage development. Focus can provide a starter kit for new DataStage users with DataStage development standards: naming standards, governance standards. Template and sample jobs covering example data flow jobs for common tasks in Data Warehousing, data migration, data validation and interfacing.
DataStage Enhancement
Focus can provide advice, design and build support for add on components for DataStage that can save significant cost and effort from projects:
- DataStage SAS Pack: read and write native SAS datasets from within a DataStage job and even execute SAS application code inside a DataStage job flow.
- DataStage Balanced Optimiser: get the most out of ETL, TETL, ELT and TELT. DataStage with DataStage Balanced Optimiser allows the pushdown of functions into the database source or target engine giving easy control over balancing load across the source database, the transformation server and the target database.
- InfoSphere CDC for DataStage: deliver delta data straight into the DataStage engine with market leading database log replication of inserts, updates and deletes as an input into a DataStage job for transformation and load to any target. InfoSphere CDC is licensed for the size of the DataStage server and supports unlimited database sources.
- DataStage automation and operational metadata - either using functions bundled with the Information Server or custom building a more flexible solution.
Advanced DataStage
Focus DataStage consultants have worked on some of the largest and most complex implementations – we have resources who have worked on the 3 Mobile Data Warehouse in Europe, for Acxiom, for Telstra Australia and for Australia’s four major banks.
Advanced DataStage services include job design review, job optimisation and job development, job control overhaul.