Data Reflections
Discussions of data warehousing, transformation and analysis by the Business Intelligence Team at Tallan.
Tallan

July 9, 2009

Understanding Metadata Management

Filed under: Analysis Services, Data Modeling, ETL, SQL Server, Uncategorized, Warehousingbgager @ 2:18 pm

Metadata Management- Fun stuff, right?  Depending on what your relationship to an organization’s data is, it can have many different meanings. Database developers and administrators tend to focus on technical metadata, which provides descriptive information on table structures, job streams, ETL mappings, data lineage, etc.

Business users have a need for metadata as well. Attribute definitions, business rules, formulas & calculations, and system definitions provide context to your data and help turn it into real information. Often in the form of a corporate glossary or data dictionary, business metadata can provide clarity to an organization.

One of the challenges of getting your organization to buy in on the need for Metadata Management is that it is often not well understood by business stakeholders. Have you ever been in a meeting that was called because of a dispute over the definition of a business term? Have you ever seen the same field, like “Transactions” for instance, listed on two different reports with different numbers? When business terminology is ambiguous and not well documented, productivity is lost and resources are wasted trying to understand and resolve such discrepancies. It also impacts the trust your people have in your data. And it happens over and over again in the average company. These are Metadata Management issues and they are costing you money.

In an effort to describe how Metadata Management impacts your organization, we have provided some resources that we hope will help. The first is a webinar that I have presented to businesses interested in understanding Metadata Management and some of the options available when implementing a solution. It is available here. This webinar describes the differences between business and technical metadata, outlines its role in the organization, and suggests a framework for creating a metadata repository from the ground up.

We have also posted a case study that explains what can go wrong in an organization that lacks a metadata management strategy. If you are wondering how expensive this oversight can be, the paper describes a project gone wrong because of the lack of available metadata. You may find similarities in your own company.

Half the battle with Metadata Management is in understanding what metadata is and the role it plays in your organization. We hope this content will help with that understanding and get you thinking about how much more efficient you can be with better data management.In future posts, we will get more into the nuts and bolts of implementing a solution.

December 4, 2008

Partitioned Fact Tables

Filed under: ETL, Warehousingachandler @ 1:06 pm

Once tables grow into the millions of records, they become candidates for partitioning. Table partitioning offers many benefits, particularly in warehouse environments. Since data is split into smaller units of storage, backups can target filegroups with a higher rate of change. Systems with multiple CPUs see improved query performance as partitioned data leads to greater parallelism. Perhaps most significant is the ability to swap in huge amounts of data by partition switching, an operation that is practically instantaneous.

Tables can be partitioned horizontally or vertically. With vertical partitioning, columns are split out into separate physical tables. This post focuses on horizontally partitioned tables, which take advantage of new constructs Microsoft added in the 2005 release - partition functions and partition schemes. Table partitioning is an Enterprise Edition only feature in SQL Server 2005 and 2008.
(more…)

November 6, 2008

Introduction to SAP Business Objects Data Integrator ETL tool

Filed under: ETLsfernandes @ 7:33 am

SAP Business  Objects Data Integrator XI R2/R3

The purpose of this presentation is to familiarize the team with some of the features of Business Objects Data Integrator (DI). The presentation is also aimed at providing some insight into how technology can affect the design and implementation of  the ETL processes’. 

While at a high level it is best that an ETL architecture be technology agnostic, the physical implementation can stand to benefit by being designed to take advantage of the features provided by the technology.

(more…)

October 22, 2008

SSIS data management fundamentals

Filed under: ETLelizotte @ 10:07 am

Loading data into a warehouse for use in business intelligence often involves loading data from multiple dissimilar sources. Many times the easiest way to consistently manage the data involves doing extracts from the source systems into a simple text format such as comma delimited. Even legacy systems have some form of reporting engines that allow for text file output, and in dealing with more current database types there are mechanisms such as BCP to allow for dumping output to a text file. A few more reason for using flat files – it allows for a snapshot in time that is replicable, it allows for a single output from a source system to easily be imported into multiple different warehouses, and allows for the imported data to easily be broken into manageable chunks by managing the text files.

SSIS has many features to simplify working with data files. It allows you to create Control Flows which sequence tasks involved in importing data, allowing for decisioning based on your criteria, as well as robust options for error management. It also allows you to easily perform common transformations, set checkpoints to define restart points and view intermediate results by using debug windows and visualizers. Microsoft has given easy visual designers for creating this flow within SSIS, making it easy and straightforward.

(more…)

October 20, 2008

ETL Process Control Architecture

Filed under: ETLmfortuna @ 10:06 am

Data Volumes are growing at incredible rates across many IT departments. Businesses are dealing with new problems every day, and there is much demand on IT to be flexible and to react to many issues facing the business on a daily basis. Many times, processes become obsolete or disfunct because they cannot keep up with the rate of change and increasing demands by the business. In some cases, the users lose confidence in the systems and seek alternative methods for delivering reports and data in a timely fashion.

A centralized, standard Process Control Environment can provide an inexpensive and efficient way to provide an auditing mechanism on new and existing data migration processes which may run occasionally or very frequently.  This will enable an IT department to become more pro-active and aware of issues well before the end users are aware. This will allow for the IT department to react and correct the issue before the specified time in the Service Level agreement, or to send a notification or alert to their users, without the fear of losing their users confidence in the data or the overall performance of the IT department.

(more…)

September 8, 2008

SCD via SQL Stored Procedure

Filed under: ETL, Warehousingsrichards @ 7:37 am

We are going to revisit the issue of dealing with Slowly Changing Dimensions in a data warehouse. We have seen a demonstration of using the SCD transformation that is available in SQL Server Integration Services (SSIS); however, this is not always the best option for processing an SCD. When the volume of rows you’re dealing with is substantial, this creates a significant, and usually unacceptable, performance hit. Another valid reason why you may choose not to implement the SSIS transformation is flexibility. The SSIS transformation requires your begin and end dates to be datetime data types, which could prove to be inconvenient when your SCD needs to tie back to a date and/or time dimension. 

One alternative we are going to exhibit is using a SQL Server stored procedure. This example demonstrates the implementation of a Type 2 SCD, preserving the change history in the dimension table by creating a new row when there are changes. (more…)

August 28, 2008

The BCP Option

Filed under: ETLdstrickland @ 10:00 am

Moving data from one repository to the next may sound like a trivial task, and in the sense of the normal mechanical approach it very well may be.  What could be simpler than INSERT INTO Here SELECT * FROM There?  Not much, it would seem, until your data source is something other than a SQL Server table or your data volume is so heavy that such a simple statement chokes, or worse yet, affects the transactional system’s performance.  What can you do when your data volumes are large enough that you need to consider faster approaches than manual SQL Statements? 

Fortuitously, Microsoft has provided us with the BCP utility.  BCP is a command line executable that is optimized to move data very quickly from one place to another.  BCP is flexible enough to allow transformations of data in the operation, as well as logging, authentication, and batch processing.  Microsoft has also provided a  T-SQL based convention for moving large chunks of data efficiently known as the BULK INSERT command.  (more…)

August 14, 2008

The SCD Transformation in SSIS

Filed under: ETL, Warehousingpross @ 10:00 am

In this post, we will demonstrate the Slowly Changing Dimension (SCD) transformation that is available for use in a Data Flow task within SQL Server Integration Services (SSIS).  We’ll walk through an example of how to use it, explaining the terms as we go. In addition, I’ll demonstrate some “extra effort” kind of modifications that can be made for a more thorough usage.  After that, we’ll discuss the pros and cons of using it at all.  And we’ll wrap up with some tips on how to tune the transformation for better performance.
(more…)

© 2008 DataReflections.Net  : : : :All rights reserved.  : : : :Tallan, Inc.