|
|
|
MOLAP or ROLAP OLAP tools take you a step beyond query and reporting tools. Via OLAP tools, data is represented using a multidimensional model rather than the more traditional tabular data model. The traditional model defines a database schema that focuses on modeling a process of function, and the information is viewed as a set of transactions, each which occurred at some single point in time. The multidimensional model usually defines a star schema, viewing data not as a single event but rather as the cumulative effect of events over some period of time, such as weeks, then months, then years. With OLAP tools, the user generally vies the data in grids or corsstabs that can be pivoted to offer different perspectives on the data. OLAP also enables interactive querying of the data. For example, a user can look at information at one aggregation (such as a sales region) and then drill down to more detail information, such as sales by state, then city, then store. OLAP tools do not indicate how the data is actually stored. Given that, its not surprising that there are multiple ways to store the data, including storing the data in a dedicated multidimensional database (also referred to as MOLAP or MDD). Examples include Arbors Softwares Essbase and Oracle Express Server. The other choice involves storing the data in relational databases and having an OLAP tool work directly against the data, referred to as relational OLAP (also referred to as ROLAP or RDBMS). Examples include MicroStrategys DSS server and related products, Informixs Informix-MetaCube, Information Advantages Decision Suite, and Platinum Technologies Plantinum InfoBeacon. (Some also include Red Bricks Warehouse in this category, but it isnt really an OLAP tool. Rather, it is a relations database optimized for performing the types of operations that ROLAP tools need.) ROLAP versus MOLAP Relational OLAP (ROLAP) Multidimensional OLAP (MOLAP) Scale to terabytes Under 50 DB capacity Managing of summary tables /indexes Instant response Platform portability Easier to implement SMP and MPP SMP only Secure Integrated meta data Proven technology Data modeling required Data warehouses can be implemented on standard or extended relational DBMSs, called relational OLAP (ROLAP) servers. these serves assume that data is stored in relational databases and they support extensions to SQL and special access and implementation methods to efficiently implement the multidimensional data model and operations. In contrast, multidimensional OLAP (MOLAP) servers are servers that directly store multidimensional data in special data structures (like arrays or cubs) and implement OLAP operations over these data in free-form fashion (free-from within the framework of the DMBS that holds the multidimensional data). MOLAP servers have sparsely populated matrices, numeric data, and a rigid structure of data once the data enters the MOLAP DBMS framework. Relational Databases ROLAP servers contain both numeric and textual data, serving a much wider purpose than their MOLAP counterparts. Unlike MOLAP DBMSs (supported by specialized database management systems). ROLAP DBMSs (or RDMBSs) are supported by relational technology. RDBMSs support numeric, textual, spatial, audio, graphic, and video data, general-purpose DSS analysis, freely structured data, numerous indexes, and star schemas. ROLAP servers can have both disciplined and ad hoc usage and can contain both detailed and summarized data. ROLAP supports large databases while enabling good performance, platform portability, exploitation of hardware advances such as parallel processing, robust security, multi-user concurrent access (including read-write with locking), recognized standards, and openness to multiple vendors tools. ROLAP is based on familiar, proven, and already selected technologies. ROLAP tools take advantage of parallel RDBMSs for those parts of the application processed using SQL (SQL not being a multidimensional access or processing language). SO, although it is always possible to store multidimensional data in a number of relations tables (the star schema), SQL does not, by itself, support multidimensional manipulation of calculations. Therefore, ROLAP products must do these calculations either in the client software or intermediate server engine. Note, however, that Informix has integrated the ROLAP calculation engine into the RDBMS, effectively mitigating the above disadvantage. Multidimensional Databases MDDs deliver impressive query performance by pre-calculating or pre-consolidating transactional data rather than calculating on-the-fly. (MDDs pre-calculate and store every measure at every hierarchy summary level at load time and store them in efficiently indexed cells for immediate retrieval.) However, to fully preconsolidate incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB; obviously, a file this size take many minutes to load and consolidate. As a result, MDDs do not scale, making them a lackluster choice for the enterprise atomic-level data in the data warehouse. However, MDDs are great candidates for the <50GB department data marts. To manage large amounts of data, MDD servers aggregate data along hierarchies. Not only do hierarchies provide a mechanism for aggregating data, they also provide a technique for navigation. The ability to navigate data by zooming in and out of detail is key. With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake. With MDDs, for example, it is common to see the structure of time separated from the repletion of time. One dimension may be the structure of a year, month, quarter, half-year, and year. A separate dimension might be different years: 1996, 1997, and so on. Adding a new year to the MDD simply means adding a new member to the calendar dimension. Adding a new year to a RDBMS usually requires that each month, quarter, half-year and year also be added. In General Usually, a scaleable, parallel database is used for the large, atomic. organizationally-structured data warehouse, and subsets or summarized data from the warehouse are extracted and replicated to proprietary MDDs. Because MDD vendors have enabled drill-through features, when a user reaches the limit of what is actually stored in the MDD and seeks more detail data, he/she can drill through to the detail stored in the enterprise database. However, the drill through functionality usually requires creating views for every possible query. As relational database vendors incorporate sophisticated analytical multidimensional features into their core database technology, the resulting capacity for higher performance salability and parallelism will enable more sophisticated analysis. Proprietary database and nonitegrated relational OLAP query tool vendors will find it difficult to compete with this integrated ROLAP solution. Both storage methods have strengths and weaknesses -- the weaknesses, however, are being rapidly addressed by the respective vendors. Currently, data warehouses are predominantly built using RDBMSs. If you have a warehouse built on a relational database and you want to perform OLAP analysis against it, ROLAP is a natural fit. This isnt to say that MDDs cant be a part of your data warehouse solution. Its just that MDDs arent currently well-suited for large volumes of data (10-50GB is fine, but anything over 50GB is stretching their capabilities). If your really want the functionality benefits that come with MDD, consider subsetting the data into smaller MDD-based data marts. When deciding which technology to go for, consider: 1) Performance: How fast will the system appear to the end-user? MDD server vendors believe this is a key point in their favor. MDD server databases typically contain indexes that provide direct access to the data, making MDD servers quicker when trying to solve a multidimensional business problem. However, MDDs have significant performance differences due to the differing ability of data models to be held in memory, sparsely handling, and use of data compression. And, the relational database vendors argue that they have developed performance improvement techniques, such as IBMs DB2 Starburst optimizer and Red Bricks Warehouse VPT STARindex capabilities. (Before you use performance as an objective measure for selecting an OLAP server, remember that OLAP systems are about effectiveness (how to make better decisions), not efficiency (how to make faster decisions).) 2) Data volume and scalability: While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes. And, although MDD servers can require up to 50% less disk space than relational databases to store the same amount of data (because of relational indexes and overhead), relational databases have more capacity. MDD advocates believe that you should perform multidimensional modeling on summary, not detail, information, thus mitigating the need for large databases. in addition to performance, data volume, and scalabiltiy, you should consider which architecture better supports systems management and data distribution, which vendors have a better user interface and functionality, which architecture is easier to understand, which architecture better handles aggregation and complex calculations, and your perception of open versus proprietary architectures. Besides these issues, you must also consider which architecture will be a more strategic technology. In fact, MDD servers and RDBMS products can be used together -- one for fast reposes, the other for access to large databases. What if? IF A. You require write access for What if? analysis B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. You dont have a DBA or data modeler personnel E. Youre developing a general-purpose application for inventory movement or assets management THEN Consider an MDD solution for your data mart (like Oracle Express, Arbors Essbase, and Pilots Lightship)
IF A. Your data is over 100 GB B. You have a "read-only" requirement THEN Consider an RDBMS for your data mart. IF A. Your data is over 1TB B. Need data mining at a detail level Consider an MPP hardware platform like IBMs SP and DB2 RDBMS If, youve decided to build a data mart using a MDD, you dont need a data modeler. Rather, you need an MDD data mart application builder who will design the business model (identifying dimensions and defining business measures based on the source systems identified. Prior to building separate stove pipe data marts, understand that at some point you will need to: 1) integrate and consolidate these data marts at the detail enterprise level; 2) load the MDD data marts; and 3) drill through from the data marts to the detail. Note that your data mart may outgrow the storage limitations an MDD, creating the need for an RDMBS (in turn, requiring data modeling similar to constructing the detailed, atomic enterprise-level RDBMS).
Send mail to webmaster@donmeyer.com with
questions or comments about this web site.
|