Search

Sunday, June 22, 2008

Mapping optimization Techniques in Informatica

Mapping Optimization:

Some of the optimization techniques can be incorporated while designing the mapping & sessions, thus reducing the time & resources needed later to tune the mapping & sessions.

The most common bottlenecks occur when Informatica writes to the Target database. You can identify the bottlenecks by following methods:

  1. Running test sessions: Configure a test session, for the Source and/or Targets to Read/Write to a file. By this you can find the Source or Target bottlenecks.
  2. Studying Performance Details: You can set the option to collect performance details, to identify the session bottlenecks. Performance details provide information such as buffer input and output efficiencies.
  3. Monitoring system performance: You can use system-monitoring tools to view percent CPU usage, I/O waits and paging to identify the system bottlenecks.

Once the bottleneck location is identified, then these need to be eliminated. These can be categorized as:

1. Source & Target Database Bottlenecks: Need to ask the DBA to optimize the database tables by creating an index, configuring key constraints, increasing the database network packet size or optimizing the query.

2. Mapping Bottlenecks: Fine-tune the pipeline logic and transformation settings and options in the mappings to eliminate the mapping bottlenecks.

3. Session bottlenecks: You can optimize the session performance by using the performance details to tune the configuration settings in the session.

4. System bottlenecks: System administrator needs to analyze the system monitoring tools information and improve the CPU & network performance.

Parameter files in Informatica

Parameter file:

1. A parameter file is created by text editor.

2. In that, we can specify the folder and session name, then list the parameters and variables used in the session and assign each value.

3. Save the parameter file in any directory, load to the server on $informatica_home/PrmFiles

4. We can define following values in a parameter

a. Mapping parameter

b. Mapping variables

c. Session parameters

5. You can include parameter and variable information for more than one session in a single parameter file by creating separate sections, for each session with in the parameter file.

6. You can override the parameter file for sessions contained in a batch by using a batch parameter file. A batch parameter file has the same format as a session parameter file.

Informatica Basic Concepts: SQ, Lookup

SQL override:

The Source Qualifier provides the SQL Query option to override the default query. You can enter any SQL statement supported by your source database. You might enter your own SELECT statement, or have the database perform aggregate calculations, or call a stored procedure or stored function to read the data and perform some tasks.


Connected & Unconnected look-up:

An unconnected Lookup transformation exists separate from the pipeline in the mapping. You write an expression using the :LKP reference qualifier to call the lookup within another transformation. While the connected lookup forms a part of the whole flow of mapping.


Various Caches:

1. Persistent cache (we can save the lookup cache files and reuse them the next time process the lookup transformation)

2. Re-cache from database (If the persistent cache not synchronized with lookup table you can configure the lookup transformation to rebuild the lookup cache)

3. Static cache (When the lookup condition is true, Informatica server return a value from lookup cache and it’s does not update the cache while it processes the lookup transformation)

4. Dynamic cache (Informatica server dynamically inserts new rows or update existing rows in the cache and the target. Suppose if we want lookup a target table we can use dynamic cache)

5. Shared cache (we can share lookup transformation between multiple transformations in a mapping. 2 lookup in a mapping can share single lookup cache)

What is dynamic lookup?

When we use target lookup table, Informatica server dynamically insert new values or it updates if the values exist and passes to target table.

Sunday, June 15, 2008

Informatica Qs

1. Explain the Architecture of informatica.
A. PowerCenter and PowerMart provides the following integrated components:

- Informatica repository. The Informatica repository is at the center of the Informatica suite. You create a set of metadata tables within the repository database that the Informatica applications and tools access. The Informatica Client and Server access the repository to save and retrieve metadata.

- Informatica Repository Server. The Informatica Repository Server manages connections to the repository from client applications. It inserts, updates, and fetches objects from the repository database tables. It also maintains object consistency.

- Informatica Client. Use the Informatica Client to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The Informatica Client has four client applications: Repository Manager, Designer, Workflow Manager, and Workflow Monitor.

- Informatica Server. The Informatica Server extracts the source data, performs the data transformation, and loads the transformed data into the targets.

2. Give the name of few transformations
A. Commonly used transformations are
Source Qualifier, Expression, Aggregator, Joiner, Lookup, Update Strategy, Filter,
Normaliser, Stored Procedure, Sequence Generator, etc.

3. What is the difference between connected and unconnected lookup?
A. An unconnected Lookup transformation exists separate from the flow of the mapping.
You write an expression using the :LKP reference qualifier to call the lookup within
another transformation.
While the connected lookup forms a part of the whole flow of mapping.
Unconnected Lookup can return maximum of one column
Connected Lookup can return multiple columns

4. What is Normalizer transformation ? Have you used it in your project?
A. Used primarily for COBOL sources, which are often stored in a
denormalized format. It is used to normalize records from COBOL and relational
sources

5. What are the system variables available in Informatica?
A. SYSDATE, SESSTARTTIME, PROC_RESULT

6. Minimum how many transformations are needed for a mapping?
A. Three. These are source definition, source qualifier and target definition.

7. Explain worklet
A. A worklet is an object that represents a set of tasks. It can contain any task available
in the Workflow Manager.

8. If two tables are present in source database and one has to get corresponding data from second. Which option is preferable- use of lookup or use of source qualifier ?
A. If table size is big then source qualifier is preferred.

9. What are shortcuts in informatica?
A. Shortcuts allow you to use metadata across folders without, making copies
ensuring uniform metadata.

A shortcut inherits all the properties of the object to which it points.
When the referred object changes, the shortcut inherits the changes.
Shortcuts allow you to reuse an object without creating multiple
objects in the repository, which also saves space in the repository.

10. If update strategy is used in a mapping, what would be the setting at the time of session creation?
A. Property “Treat source rows as” should be set as “Data driven”

11. For a flat file data source, do we mention number of sorted ports in properties of SQL qualifier?
A. No

12. Explain about your current project?
A. As per bio-data

13. What scheduler do you use in your project?
A. Informatica Scheduler and Cron

14. Tell me few unix commands.
A. cp , rm, echo, kill, grep, ps, head, tail, cat, touch etc.

15. Explain wc command.
A. This command counts the number of lines, words, and bytes or characters in a file.

16. How will you output 16th row of a file in unix?
A. Use following command
head -16 | tail –1

17. Which database is being used in your project?
A. Oracle

18. What will be the select statement to read data form a table if that table is already being used by some other SQL/mapping?
A. In UDB, even a select statement creates a lock. To avoid this situation following
query should be used

SELECT …….
FROM ……..
WHERE …..
FOR READ ONLY

19. What is a primary key and foreign key?
A. Primary and foreign keys are the most basic components on which relational theory is
based. Primary keys enforce entity integrity by uniquely identifying entity instances.
Foreign keys enforce referential integrity by completing an association between two
entities.

Datawarehousing Interview Qs

1. What is a Data warehouse?
A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.

2. What is a Data mart?
A Data mart is a subset of a Data warehouse and is generally specific to a certain department or process, e.g. a finance data mart or a production data mart.

3. What are the various approaches of building a Data warehouse?
This is a generic question: From a business perspective, it is very important to first get clarity on the end user requirements and a system study before commencing any Data warehousing project. From a technical perspective, it is important to first understand the dimensions and measures, determine quality and structure of source data from the OLTP systems and then decide which dimensional model to apply, i.e. whether we do a star or snowflake or a combination of both. From a conceptual perspective, we can either go the Ralph Kimball method (build data marts and then consolidate at the end to form an enterprise Data warehouse) or the Bill Inmon method (build a large Data warehouse and derive data marts from the same. In order to decide on the method, a strong understanding of the business requirement and data structure is needed as also consensus with the customer.

4. What is OLAP?
OLAP is a multidimensional data structure from where we can view summary level information about a business, e.g. total sales in a year or total sales in a certain geography.

5. What is ETL?
ETL is a process of extracting data (OLTP data) from source systems and transforming them into dimensional models and storing the same in a Data warehouse. ETL also involves data cleansing and data manipulation, based on the nature of the business requirement. It is often a long process, which takes around 70-80% time in a typical Data warehouse project.

6. What is the Difference between OLTP and OLAP?
OLTP refers to transactional data or data captured at the point of transaction, whereas, OLAP is a multidimensional representation of data which throws out summary level information.

7. What are the various Reporting tools in the Market?
To name a few- Cognos 8 BI, Business Objects Enterprise Edition, SQL Server Reporting Services, Oracle Discoverer, IBM DB2 Alphablox, Hyperion, Microstrategy, SAS, SAP BIW.

8.What is Metadata?
In simple words we can say that it is data about data. It stores all information about data like where it is stored, how it is stored, how it is formatted, address of the stored location, size of the data, ownership of data, created and modified time of data etc.

9. What is Staging Area?
Staging area is also called as Operational Data Store (ODS). It is a data holding place where the data which is extracted from all the data sources are stored. From the Staging area, data is loaded to the data warehouse. Data cleansing takes place in this stage.

10. What is Factless Fact Table?
Factless fact table is nothing but a fact table which doesn't have any measures.

11. What is slowly changing dimension?
It is nothing but a dimension which stores historical data as well as data which changes with time. For example, in an organization, there are two departments (Dep1 and Dep2) and employee1 is in Dep1. After completing the 1st year, employee1 got promotion to Dep2. Here the data item is department name of the employee dimension is changed with respect to time.

12. What is a Surrogate key?
Surrogate key is a unique identifier. It is used to identify each row of a table uniquely. It also helps to track slowly changing dimensions. It replaces all natural keys. Surrogate keys are system generated keys, they are not derived from any data sources.

13. What is a Data source?
It is a place from where the data is extracted for the data warehouse. Data is Extracted, Transformed and Loaded to the data warehouse from the data source. The data source may be operational database, flat files, CVS (Comma Separated Version) files, VSAM (Virtual Storage Access method) files etc.

14. What is Virtual Data Warehousing?
It is the process of accessing the operational database directly by the end users for decision making purposes. But in traditional data warehousing systems, we load the data from the source to the target first and then we analyse it. In virtual data warehousing concept, data warehouse is a virtual entity.

15. What is the difference between star schema and snowflake schema?
The main difference between star schema and snowflake schema is that the star schema is highly denormalized and the snowflake schema is normalized. So the data access latency is less in star schema in comparison to snowflake schema. As the star schema is denormalized, the size of the data warehouse will be larger than that of snowflake schema. The schemas are selected as per the client requirements. Performance wise, star schema is good. But if memory utilization is a major concern, then snow flake schema is better than star schema.

16. What is Data Profiling?
Data Profiling is a process to find out the process information present in the source. The process or tool used for data profiling is called Data Discovery Phase.

17. What is a Junk Dimension?
A junk dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. Junk Dimension is just a dimension that stores unwanted attributes.

18. What is a Fact table?
It is a table which contains factual information of a business process. Fact table contains measures, foreign keys of all related dimension tables and attributes etc.

19. What is a Dimension table?
It is a table which contains categorized data. Eg: Time dimension, Products dimension etc.

20. What is fact constellation?
Fact constellation is also known as galaxy schema. It is nothing but a schema which contains multiple fact tables shares dimensions. It is a collection of star schemas which shares their dimension. So it is called as a galaxy schema.

21. What is cube?
Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data.

22. What is drill-down and drill-up?
Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data.

23. What is the need of building a data warehouse?
The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information.

24. What is Data Modeling? What are the different types of Data Modeling?
Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure.The Data structure formed are maintained in a database management system. The Different types of Data Modeling are: 1. Dimension Modelling 2. E-R Modelling

25. What are the different types of Data models?
Conceptual Data Model, Logical Data Model and Physical Data Model

26. What are the different types of OLAP TECHNOLOGY?
Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP Relational online analytical process. It has the ability to handle large amount of data. HOLAP Hybrid online analytical process. It is a combination of both HOLAP and MOLAP.

27. What is the difference between a Database and a Datawarehouse?
Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services. Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose. In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise.

Informatica Interview Questions

1. What is the use of tracing levels in transformation?
Tracing levels store information about mapping and transformations.

2. What are the various types of transformation?
Various types of transformation are: Aggregator Transformation, Expression Transformation, Filter Transformation, Joiner Transformation, Lookup Transformation, Normalizer Transformation, Rank Transformation, Router Transformation, Sequence Generator Transformation, Stored Procedure Transformation, Sorter Transformation, Update Strategy Transformation, XML Source Qualifier Transformation, Advanced External Procedure Transformation, External Transformation.

3. What is the difference between active transformation and passive transformation?
An active transformation can change the number of rows that pass through it, but a passive transformation can not change the number of rows that pass through it.

4. What is the use of control break statements?
They execute a set of codes within the loop and endloop.

5. What are the types of loading in Informatica?
There are two types of loading, normal loading and bulk loading. In normal loading, it loads record by record and writes log for that. It takes comparatively a longer time to load data to the target in normal loading. But in bulk loading, it loads number of records at a time to target database. It takes less time to load data to target.

6. What is the difference between source qualifier transformation and application source qualifier transformation?
Source qualifier transformation extracts data from RDBMS or from a single flat file system. Application source qualifier transformation extracts data from application sources like ERP.

7. How do we create primary key only on odd numbers?
To create primary key, we use sequence generator and set the 'Increment by' property of sequence generator to 2.

8. What is authenticator?
It validates user name and password to access the PowerCenter repository.

9. What is the use of auxiliary mapping?
Auxiliary mapping reflects change in one table whenever there is a change in the other table.

10. What is a mapplet?
Mapplet is the set of reusable transformation.

11. Which ETL tool is more preferable Informatica or Data Stage and why?
Preference of an ETL tool depends on affordability and functionality. It is mostly a tradeoff between the price and feature. While Informatica has been a market leader since the past many years, DataStage is beginning to pick up momentum.

12. What is worklet?
Worklet is an object that represents a set of tasks.

13. What is workflow?
A workflow is a set of instructions that tells the Informatica server how to execute the tasks.

14. What is session?
A session is a set of instructions to move data from sources to targets.

15. Why do we need SQL overrides in Lookup transformations?
In order to lookup more than one value from one table, we go for SQL overrides in Lookups.

16. What is Target Update override?
It overrides the default update statement in the target properties.

17. How are Informatica object files stored?
They are stored as XML in the Informatica Repository.

18. How do we tune the performance of Informatica transformations?
• Use Filter transformation before transforming the unwanted data. • Go for dynamic lookups as far as possible. • Use Union transformations in place of using multiple transformations with binded SQLs.

19. In what conditions we cannot use Joiner transformation?
• Both pipelines begin with the same original data source. • Both input pipelines originate from the same Source Qualifier transformation. • Both input pipelines originate from the same Normalizer transformation. • Both input pipelines originate from the same Joiner transformation. • Any of the input pipelines contains an Update Strategy transformation. • Any of the input pipelines contains a connected or unconnected Sequence Generator transformation.

20. If there is no PK or FK in the target table, how do we update or insert value into the table?
We take a dynamic lookup on the target and do a comparison with source in an expression and flag it.

21. What is the difference between reusable transformation and mapplet?
Reusable transformation is a single transformation and mapplet is a set of transformations.

22. What is the incapability of Lookup transformation?
We can not use Lookup transformation for non-RDBMS sources.

23. Where do we override native SQL in transformations?
We can override it in source qualifier and lookup transformation.

24. When we read from source and write into flat file, how to specify the column names in the flat file?
Using pre-SQL, we can specify the column names.

25. How do we track rejected or bad data?
We can design a trap mechanism using Router or filter transformation and track the rejected data.

26. How do we join two tables without joiner or SQL override?
We can join the tables using lookup tansformation and making a cartesian product.

27. What are pre-session and post-session options?
These are shell commands that informatica server performs before running the session. We should have permission and privileges to run the options in Informatica.

28. How do we copy mapping?
The steps to copy mapping are mentioned below: 1.) First, open navigator window 2.) Open the mapping in repository 3.) Click copy

29. What are the tasks done by informatica server?
Tasks done by informatica server are: a.) Managing the sessions b.) Running the workflows c.) Tuning performance d.) Scheduling of sessions and workflows

30. What is the difference between source qualifier and joiner transformation?
We need matching keys in source qualifier which we do not need in joiner transformation.

31. What are mapping parameters and mapping variables?
Mapping parameters are parameters which are constant through out the session. They are specified before running the session. Unlike parameters, variables change through out the session.

32. What is the default join that source qualifier provides?
The default join that source qualifier uses is inner equi join.

33. What is Normalizer transformation?
This is one of the transformations in Informatica. This is applicable only for cobol sources. If we want to change rows into column or column into rows, we have to go for normalizer.

34. What are the default sources supported by Informatica Powermart?
Informatica Powermart supports Relational tables, fixed width and delimited files and Cobol sources.

35. What is sql override in Informatica?
We use sql override in Informatica to improve performance of lookup transformation. We write the query which is already executed by Informatica. Here we use Join queries.

36. What is Power Analyzer in Informatica?
Informatica Power Analyzer is the only BI tool in Informatica. This is the tool in Informatica which gives advantages for data delivering, adhoc reporting and dashboard.