Search

Sunday, June 15, 2008

Datawarehousing Interview Qs

1. What is a Data warehouse?
A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.

2. What is a Data mart?
A Data mart is a subset of a Data warehouse and is generally specific to a certain department or process, e.g. a finance data mart or a production data mart.

3. What are the various approaches of building a Data warehouse?
This is a generic question: From a business perspective, it is very important to first get clarity on the end user requirements and a system study before commencing any Data warehousing project. From a technical perspective, it is important to first understand the dimensions and measures, determine quality and structure of source data from the OLTP systems and then decide which dimensional model to apply, i.e. whether we do a star or snowflake or a combination of both. From a conceptual perspective, we can either go the Ralph Kimball method (build data marts and then consolidate at the end to form an enterprise Data warehouse) or the Bill Inmon method (build a large Data warehouse and derive data marts from the same. In order to decide on the method, a strong understanding of the business requirement and data structure is needed as also consensus with the customer.

4. What is OLAP?
OLAP is a multidimensional data structure from where we can view summary level information about a business, e.g. total sales in a year or total sales in a certain geography.

5. What is ETL?
ETL is a process of extracting data (OLTP data) from source systems and transforming them into dimensional models and storing the same in a Data warehouse. ETL also involves data cleansing and data manipulation, based on the nature of the business requirement. It is often a long process, which takes around 70-80% time in a typical Data warehouse project.

6. What is the Difference between OLTP and OLAP?
OLTP refers to transactional data or data captured at the point of transaction, whereas, OLAP is a multidimensional representation of data which throws out summary level information.

7. What are the various Reporting tools in the Market?
To name a few- Cognos 8 BI, Business Objects Enterprise Edition, SQL Server Reporting Services, Oracle Discoverer, IBM DB2 Alphablox, Hyperion, Microstrategy, SAS, SAP BIW.

8.What is Metadata?
In simple words we can say that it is data about data. It stores all information about data like where it is stored, how it is stored, how it is formatted, address of the stored location, size of the data, ownership of data, created and modified time of data etc.

9. What is Staging Area?
Staging area is also called as Operational Data Store (ODS). It is a data holding place where the data which is extracted from all the data sources are stored. From the Staging area, data is loaded to the data warehouse. Data cleansing takes place in this stage.

10. What is Factless Fact Table?
Factless fact table is nothing but a fact table which doesn't have any measures.

11. What is slowly changing dimension?
It is nothing but a dimension which stores historical data as well as data which changes with time. For example, in an organization, there are two departments (Dep1 and Dep2) and employee1 is in Dep1. After completing the 1st year, employee1 got promotion to Dep2. Here the data item is department name of the employee dimension is changed with respect to time.

12. What is a Surrogate key?
Surrogate key is a unique identifier. It is used to identify each row of a table uniquely. It also helps to track slowly changing dimensions. It replaces all natural keys. Surrogate keys are system generated keys, they are not derived from any data sources.

13. What is a Data source?
It is a place from where the data is extracted for the data warehouse. Data is Extracted, Transformed and Loaded to the data warehouse from the data source. The data source may be operational database, flat files, CVS (Comma Separated Version) files, VSAM (Virtual Storage Access method) files etc.

14. What is Virtual Data Warehousing?
It is the process of accessing the operational database directly by the end users for decision making purposes. But in traditional data warehousing systems, we load the data from the source to the target first and then we analyse it. In virtual data warehousing concept, data warehouse is a virtual entity.

15. What is the difference between star schema and snowflake schema?
The main difference between star schema and snowflake schema is that the star schema is highly denormalized and the snowflake schema is normalized. So the data access latency is less in star schema in comparison to snowflake schema. As the star schema is denormalized, the size of the data warehouse will be larger than that of snowflake schema. The schemas are selected as per the client requirements. Performance wise, star schema is good. But if memory utilization is a major concern, then snow flake schema is better than star schema.

16. What is Data Profiling?
Data Profiling is a process to find out the process information present in the source. The process or tool used for data profiling is called Data Discovery Phase.

17. What is a Junk Dimension?
A junk dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. Junk Dimension is just a dimension that stores unwanted attributes.

18. What is a Fact table?
It is a table which contains factual information of a business process. Fact table contains measures, foreign keys of all related dimension tables and attributes etc.

19. What is a Dimension table?
It is a table which contains categorized data. Eg: Time dimension, Products dimension etc.

20. What is fact constellation?
Fact constellation is also known as galaxy schema. It is nothing but a schema which contains multiple fact tables shares dimensions. It is a collection of star schemas which shares their dimension. So it is called as a galaxy schema.

21. What is cube?
Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data.

22. What is drill-down and drill-up?
Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data.

23. What is the need of building a data warehouse?
The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information.

24. What is Data Modeling? What are the different types of Data Modeling?
Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure.The Data structure formed are maintained in a database management system. The Different types of Data Modeling are: 1. Dimension Modelling 2. E-R Modelling

25. What are the different types of Data models?
Conceptual Data Model, Logical Data Model and Physical Data Model

26. What are the different types of OLAP TECHNOLOGY?
Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP Relational online analytical process. It has the ability to handle large amount of data. HOLAP Hybrid online analytical process. It is a combination of both HOLAP and MOLAP.

27. What is the difference between a Database and a Datawarehouse?
Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services. Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose. In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise.

No comments: