I love Pizzas. When I arrived at Bangalore, I used to have 2-3 a week. I had this (bad) habit for around till mom came over here in month of Dec and now Papa John is coming to India soon. Can't wait for Papa.
And ManU has defeated Arsenal and they have clear 6 pt lead from Chelsea. Wanna see them lose the title [:D]
Sunday, April 13, 2008
Friday, April 11, 2008
What's factless fact
I have been reading about factless fact table for quite some time and had assumption that bridge table in case of M-M relationship was "the" factless. But while having a knowledge sharing session I was given a total new definition. That was convincing but unbelievable. Though bridge table is actually a factless fact table but not as per Kimball.
A factless fact table is table that doesn't have fact at all. They may consist of nothing but keys. There are tow types of factless fact table. 1-> event 2-> coverage.
Take an example of a factless fact table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. Take an example of tracking student attendance. Imagine that you have a modern student tracking system that detects each student attendance event each day. When the student walks through the door into the lecture, a record is generated.
One can easily list the dimensions surrounding the student attendance event.
Date: one record in this dimension for each day on the calendar
Student: one record in this dimension for each student
Course: one record in this dimension for each course taught each semester
Teacher: one record in this dimension for each teacher
Facility: one record in this dimension for each room, laboratory, or athletic field
The only problem is that there is no obvious fact to record each time a student attends a lecture or suits up for physical education. Tangible facts such as the grade for the course don't belong in this fact table. This fact table represents the student attendance process, not the semester grading process or even the midterm exam process. Actually, this fact table consisting only of keys is a perfectly good fact table and probably ought to be left as is
A second kind of factless fact table is called a coverage table. Coverage tables are frequently needed when a primary fact table in a dimensional data warehouse is sparse. Take simple sales fact table that records the sales of products in stores on particular days under each promotion condition. The sales fact table does answer many interesting questions but cannot answer questions about things that didn't happen. For instance, it cannot answer the question, "Which products were on promotion that didn't sell?" because it contains only the records of products that did sell. The coverage table comes to the rescue. A record is placed in the coverage table for each product in each store that is on promotion in each time period. In general, which products are on promotion varies by all of the dimensions of product, store, promotion, and time. This complex many-to-many relationship must be expressed as a fact table.
The coverage table must only contain the items on promotion; the items not on promotion that also did not sell can be left out. Also, it is likely for administrative reasons that the assignment of products to promotions takes place periodically, rather than every day. Often a store manager will set up promotions in a store once each week. Thus we don't need a record for every product every day. One record per product per promotion per store each week will do.
Answering the question, "Which products were on promotion that did not sell?" requires a two-step application. First, consult the coverage table for the list of products on promotion on that day in that store. Second, consult the sales table for the list of products that did sell. The desired answer is the set difference between these two lists of products.
A factless fact table is table that doesn't have fact at all. They may consist of nothing but keys. There are tow types of factless fact table. 1-> event 2-> coverage.
Take an example of a factless fact table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. Take an example of tracking student attendance. Imagine that you have a modern student tracking system that detects each student attendance event each day. When the student walks through the door into the lecture, a record is generated.
One can easily list the dimensions surrounding the student attendance event.
Date: one record in this dimension for each day on the calendar
Student: one record in this dimension for each student
Course: one record in this dimension for each course taught each semester
Teacher: one record in this dimension for each teacher
Facility: one record in this dimension for each room, laboratory, or athletic field
The only problem is that there is no obvious fact to record each time a student attends a lecture or suits up for physical education. Tangible facts such as the grade for the course don't belong in this fact table. This fact table represents the student attendance process, not the semester grading process or even the midterm exam process. Actually, this fact table consisting only of keys is a perfectly good fact table and probably ought to be left as is
A second kind of factless fact table is called a coverage table. Coverage tables are frequently needed when a primary fact table in a dimensional data warehouse is sparse. Take simple sales fact table that records the sales of products in stores on particular days under each promotion condition. The sales fact table does answer many interesting questions but cannot answer questions about things that didn't happen. For instance, it cannot answer the question, "Which products were on promotion that didn't sell?" because it contains only the records of products that did sell. The coverage table comes to the rescue. A record is placed in the coverage table for each product in each store that is on promotion in each time period. In general, which products are on promotion varies by all of the dimensions of product, store, promotion, and time. This complex many-to-many relationship must be expressed as a fact table.
The coverage table must only contain the items on promotion; the items not on promotion that also did not sell can be left out. Also, it is likely for administrative reasons that the assignment of products to promotions takes place periodically, rather than every day. Often a store manager will set up promotions in a store once each week. Thus we don't need a record for every product every day. One record per product per promotion per store each week will do.
Answering the question, "Which products were on promotion that did not sell?" requires a two-step application. First, consult the coverage table for the list of products on promotion on that day in that store. Second, consult the sales table for the list of products that did sell. The desired answer is the set difference between these two lists of products.
Posted by
Ashish Tiwari
@ GMT
2:49 PM
Wednesday, April 09, 2008
Saturday, April 05, 2008
MicroStrategy doesn't have affinity to Snowflake schema (anymore)
During my days at Cybage, I used to read lots of internal documentation from MicroStrategy. Well, I had access to or were give documents for knowledge purpose. For months, even after having left Cybage, I used to try to decode why MicroStrategy likes Snowflake schema more than Star Schema. (Kimball hates Snowflake and abuses it like anything, read Warehouse Toolkit by him). That's why I used to like reading about Enmon more. I came to know about this so call affinity was actually a shortcoming the way MicroStrategy was designed and/or programmed.
MicroStrategy document used to say and recommend Snowflake over Star. Reason - MicroStrategy is designed to make most of snowflake schema. Well actually it is a bug in MicroStrategy that will prevent it to work perfectly in Star schema.
Snapshots are from MicroStrategy Tech Notes:
This schema is characterized by one lookup table per dimension, with base tables at the lowest level. This is the fastest way to set up a data warehouse:
This type of schemas is fully supported but difficulties may arise when adding aggregate tables:
Problem ----> Double counting
According to the diagram above, a report that contains [Month] and the a metric SUM(SALES_AMT) will go to the aggregate table [MONTH_STORE_SALES] and join to the [MONTH_ID] column to retrieve the description from the [LU_TIME] table. Since the [MONTH_ID] column is not unique in its lookup table, the results will appear duplicated.
Why ----> MicroStrategy is optimized to work with snowflake schemas, where each attribute level has a distinct lookup table.
Solution ----> If aggregate tables are needed, use one lookup table per attribute to avoid double counting.
Reaction -----> Give me a break.
I could never understand why this affinity. I came to know about this during my early months working on MicroStrategy and no one could solve it.
BTW, I was lucky to know this thing. Person holding position of Director had come to India for knowledge transfer. He is the man behind getting MicroStrategy into the company.I had gone for a tea break and when I came back I didn't notice that every MicroStrategy developer is missing. I thought they must have gone for a break. But I thought checking out what' was going on. Karthik told me there was a session by Asif but he said it won't help me much. Still, I thought I should attend and in that meeting I came to know this. Wow.. this made the visit fruitful for me.
Status of this defect -----> It has been weeded out. I've done several implementation of it. But working on 7.2.2 was pain. I had to go for duplicate Logical tables. Those days it was not a good help too.
----
Update on 21st Sept 2008
MicroStrategy have updated technote last month stating that custom logical tables are workaround to solve this problem.
MicroStrategy document used to say and recommend Snowflake over Star. Reason - MicroStrategy is designed to make most of snowflake schema. Well actually it is a bug in MicroStrategy that will prevent it to work perfectly in Star schema.
Snapshots are from MicroStrategy Tech Notes:
This schema is characterized by one lookup table per dimension, with base tables at the lowest level. This is the fastest way to set up a data warehouse:
This type of schemas is fully supported but difficulties may arise when adding aggregate tables:
Problem ----> Double counting
According to the diagram above, a report that contains [Month] and the a metric SUM(SALES_AMT) will go to the aggregate table [MONTH_STORE_SALES] and join to the [MONTH_ID] column to retrieve the description from the [LU_TIME] table. Since the [MONTH_ID] column is not unique in its lookup table, the results will appear duplicated.
Why ----> MicroStrategy is optimized to work with snowflake schemas, where each attribute level has a distinct lookup table.
Solution ----> If aggregate tables are needed, use one lookup table per attribute to avoid double counting.
Reaction -----> Give me a break.
I could never understand why this affinity. I came to know about this during my early months working on MicroStrategy and no one could solve it.
BTW, I was lucky to know this thing. Person holding position of Director had come to India for knowledge transfer. He is the man behind getting MicroStrategy into the company.I had gone for a tea break and when I came back I didn't notice that every MicroStrategy developer is missing. I thought they must have gone for a break. But I thought checking out what' was going on. Karthik told me there was a session by Asif but he said it won't help me much. Still, I thought I should attend and in that meeting I came to know this. Wow.. this made the visit fruitful for me.
Status of this defect -----> It has been weeded out. I've done several implementation of it. But working on 7.2.2 was pain. I had to go for duplicate Logical tables. Those days it was not a good help too.
----
Update on 21st Sept 2008
MicroStrategy have updated technote last month stating that custom logical tables are workaround to solve this problem.
Posted by
Ashish Tiwari
@ GMT
3:10 PM
Subscribe to:
Posts (Atom)