|
|
|
Methodology for Designing, Building, and Deploying the Data Warehouse
(1) Assemble the Team A: The project manager has the responsibility of assembling an initial team (including the data modeler, business analysts, and key business end users) to determine business requirements and create the project definition and scope. This initial team will be responsible for completing the first five steps in this methodology as well as identifying the subject area for the pilot data warehouse. (2) Determine Initial Business Requirements: Start the project by gathering high-level data warehouse requirements and determining expectations with your power users, casual users, and end users. (3) Build the High Level Subject Area Diagram: Then, build a data model that represents a corporations high level subject areas (or entities). This data model generally shows the subject areas and their relationships to each other, as well as definitions for each entity. One of these subject areas is generally chosen for your pilot. (4) Define the Project Definition & Scope: The project definition and scope functions as the data warehouse business plan identifying the project description, objectives, critical success factors, assumptions, and issues. Importantly, the project definition and scope document sets expectations up front. (The scope document will initially identify the pilot deployment.) Data naturally congregates around major categories relevant to the business. These categories are called subjec aras or high-levle entities.(5) Construct the Project Plan: The project plan details the design and construction of the data warehouse. The plan consists of tasks for building the data warehouse, the time frame for each activity, deliverables, milestones, assigned resources, and costs. (6) Chose the Tools & Infrastructure: Due to the complexity of the data warehouse infrastructure, we examine vendors and products in Chapter 4 (of "Building A Better Data Warehouse" by Don Meyer & Casey Cannon), evaluating, among other attributes, ease of use, performance and vendor stability. (7) Assemble the Team B: After completing steps one through five, the rest of the team should be assembled to complete the remaining tasks in parallel. (8) Gather Additional Requirements: This step differs from the initial requirements gathering because it is subject area specific. For example, you would ask your sales organization: Where is information stored? Which is the system of record? What information do you want via the reports? What are your power user, casual user, and end user expectations? Do you want daily transaction reports or weekly transaction reports? (8b) Determine End User Report Requirements: The data warehouse provides a source of reliable, accessible, well-defined and well-documented data. Provide tools to end users that enable them to realize the value of the data. (9) Identify the Source System: After the data model is built, identify and define the system of record. (10) Create the Logical Data Model: After architecting your high-level subject area diagram, create a Logical Data Model reflecting the attributes and entities for the selected data warehouse area. The logical data model captures all pertinent information about the selected entity (for example, a customer, an order or an invoice). However, dont worry at this point about whether youre using a computerized, manual, or mechanical process for your data warehouse implementation. Use this step to gather information only. The data warehouse is a data-driven development process. This approach centers around identifying a corporations common enterprise data through the technique called data modeling. Typical data models begin with the high-level subject area digram, followed by the mid-level dat model, and the logical and physical data models. A data model is a system used to represent the relationships between data, sometimes referred to as an entity relationship digram (ERD).(11) Size the Data Warehouse: Capacity planning in the data warehouse centers around disk storage and processing resources--the more data there is, the more processing power required. Disk storage capacity is a function of the level of detail stored, the length of time the data is kept, and the number of occurrences of data to be stored. Processor capacity is a function of the batch processing update window and concurrent ad-hoc users workload. (12) Build the Physical Data Model: The DBA implements the physical data model converting the logical data model into a physical model by removing operational-only data, adding time stamps, indexes, referential integrity constraints, merging tables, adding levels of summarization, aggregation, or derivations of data. Performance issues and end user requirements will be inputs to this process. (13) Build the Database Management System (DBMS): Its time to build the database and tables to house the data. (14a,b) Extract, Transform, & Scrub: This is the process of mapping all data elements from the source systems to the data warehouse tables typically requiring the transformation and merging of data from the source system (the OLTP) to the data warehouse. (15) Load the Data: Load the data--which really means: create, load, and optimize the DBMS as well as schedule the processes for extracting, loading, scrubbing, and transferring the data to the data warehouse host and then building summarization tables for performance. (16a,b) Develop End User Tool Report Templates: Ensure that the end user queries, reporting, graphing, ad hoc requests, and analysis requirements are met via an off-the-shelf package or an in-house developed application. (17) Create Repository for and Document the DBMS Meta Data: Meta data catalogues the information stored in the data warehouse and the transformation performed on the data when transferred from the legacy systems (or OLTP systems). (18) Tune the Database: Tune the data loaded into the data warehouse for optimum performance of both batch loading and end user access. choose tools that identify problems, troublesome thresholds, and applicable corrective action. (19) Secure the Data: The data warehouse is built for access of data and is considered a failure if it cannot accommodate easy and unconstrained access. However, security of data in the data warehouse requires that access of data be limited and controlled. (20) Document Operation Processes & Procedures (21) Train: Train the end users to efficiently use the data access tool and educate them as to the implications of the data. (22a,b,c) Test, Rollout, and Follow-up: Create a test plan, then implement, modify, and rollout to business analysts end users. Then, conduct a post-project evaluation to determine where improvements could be made for the next data warehouse project. Also, review possible next projects to determine the best fit with the existing data.
Send mail to webmaster@donmeyer.com with
questions or comments about this web site.
|