dma2.gif (5195 bytes)Data Mining & Analysis, LLC

logo2.gif (5725 bytes)

Home Profile Services Staff Resources News Case Studies Feedback Contents Search

The enterprise data warehouse versus the data mart

When designing and constructing the data warehouse, companies can either: 1) establish an enterprise wide architecture and then construct data warehouses AND data marts which conform to the enterprise wide architecture; or 2) implement a highly-focused and targeted data mart project aimed at a specific area of the business. The disparities in enterprise on-line transaction systems started theorists envisioning data warehouses -- on-line repositories of scrubbed, detailed, and summarized enterprise data based on common business measures serving the needs of business users, promoting accurate reporting, and off-loading the query processing over-head from transaction systems. These data repositories would:

  • Share common business assumptions
  • Project the enterprise from inconsistent reporting across business units.
  • Deliver access to detailed corporate data via a mix of quick-response summaries and aggregates
  • Lessen the expense required to create and modify legacy transformation feeds into multiple decision support engines
  • Reduce the daily burden of wide are networks (Was)
  • Scale to meet the needs of the organization

Embracing this enterprise approach yields the best long-term results, but invokes the most angst. It is difficult, expensive, and time-consuming to achieve consensus on a single, consistent, accepted, and valid view of the business and the data it needs. Starting with smaller, more focused applications of the warehouse avoids the stress of the enterprise approach by limiting the extent of the implementation. It’s also simpler architecturally -- there are usually limited data sets and limited user views. Basically, the more focused approach trades the near term pain of dealing with data standardization issues for the longer term pain of dealing with cross-organization operational issues. In this approach each department is responsible for extracting whatever data it needs, defining its own meta data, and using its own private warehouse for decision support at the departmental level. Obviously: 1) this architecture id difficult to evolve to an enterprise view; and 2) the lack of data standardization prevents analysts in one DEPARTMENT from accessing information from another Department’s warehouse which might be of use to them in their analysis. Each mart can create confusion, overlapping, and contradictory views of the business like: What is a customer? A product? A Sale?

The data that resides in the data warehouse is granular, while the data in the data mart is refined.

Understanding Data marts

What is the appeal of the data mart? Well, as data warehouses grow, more and more departmental DSS processing id done inside the data warehouse, leading to resource consumption. If a department has its own data mart, it can customize the data as the data flows into the data mart from the data warehouse. The department can summarize, sort, select, and structure its own data without considering other departments. And typically, the department can select a smaller amount of historical data than that found in the data warehouse. Finally, the processing and storage costs on a data mart server are significantly less than the unit cost for the machine that houses the data warehouse. Unlike physically centralized data warehouses, which typically expand between 3-5 times every 18 months, the data mart generally expands horizontally, breeding requests for more data marts at brush-fire speeds. The data mart is a powerful and natural extension of the data warehouse because it extends DSS to the departmental environment.

The more focused data mart is preferable in organizations that have a business problem with a single focus that requires data existing in only a few places. The user of the data mart environment is sometimes called the departmental decision support system (DSS) analyst -- an individual who does decision-making with a departmental bias. The departmental DSS analyst is not a technician, but rather a business person making mid - to - long-term, strategic decisions.

Note that performance expectations in the DSS environment are entirely different from those in the OLTP environment. For example, DSS environment response time requirements vary from one minute to 24 hours. the issue of performance is relaxed because of the abundance of data and the high levels of data exploration. In turn, performance in the data mart is somewhat different from the data warehouse because: 1) the data mart is used by regular users who have predictable data requirements (enabling reasonable performance objectives to be set); 2) there is less data in the data mart environment. Optimal performance in the data mart is achieved by:

  • Making extensive use of indexes
  • Using star joins
  • Limiting the volume of data that is found in the data mart
  • Creating arrays of data
  • Creating aggregate records
  • Creating pre-joined tables

A Regular user undertakes standard, repetitive queries on small units of data that usually have predictable responses. A power user does random ad hoc queries, generating unpredictable responses on large units of data. Most users are regular. Note that front end toll functionality and pricing are structured along these definitions as well.

Data is loaded into the data mart from the data warehouse via a load programs that takes into consideration:

  • Loading schedule and speed
  • How frequently the program is run
  • Total or partial refreshment
  • Data customization
  • Selecting, re-sequencing, merging, and aggregating of data
  • Data summarization
  • Integrity of data relationships
  • Load process meta data creation

The data mart environment requires data usage and data content tracking. The data usage tracker identifies:

  • Data being accessed
  • Active users
  • Average response time
  • Amount of data being requested
  • Busiest times of the day, week, and month
  • If aggregate tables need to be built and where

the data content tracker looks at data mart content, data integrity, growth rate, and data access.

 

Send mail to webmaster@donmeyer.com with questions or comments about this web site.
Copyright © 1998,2000 Data Mining & Analysis, LLC