In this article, I will try to give you information about Informatica’s approach to Big Data technology which is developing and growing day by day and Informatica Big Data Management (BDM) which is the solution that it provides.

Informatica BDM is a big data platform that can connect to many different systems by means of its ready-made connectors, where you can make required transformations and calculations with drag-and-drop feature through visual interfaces, and that allow you to run these improvements completely on Big Data platforms.

 

Why Informatica Big Data Management?

We can list many reasons for this of course, but I guess the most important reason is the emergence of a new Big Data technology every passing day. The sector, which started to grow rapidly with Hadoop, which was launched in 2006 by Doug Cutting and triggered in 2004 by Google Cutting System Paper, has gained many new technologies thanks to investments made by large companies in this area. (Facebook has developed Hive, LinkedIn has developed Kafka and donated it to Apache Software Foundation subsequently)

Of course, adaptation to rapidly developing and changing technologies turns into a challenging situation that requires corporations to allocate resources, time and budget. When we consider that there were many distributors, databases, data formats, storage layer technologies, etc. in the Hadoop environment even today, we are at the point where we are not be able to predict what the future brings us.

This is where Informatica’s BDM solution positioned. Regardless of how the technology changes, no matter how the data type changes, Informatica lets you work as a layer between users and technologies, making technology-independent enhancements and leveraging the power of Big Data environments.

I would like to continue the Informatica section with a very small sample. As you know, today many organizations have data warehouses and datamarts. For most of them, loading of this data is done through SQL procedures and packages written in the past. Over time, these data warehouses and datamarts have grown considerably both in terms of the number of tables and the size of the tables. One of the main problems of many companies today is not being able to follow the data flow from source to target and not being able to manage metadata effectively. The process of converting these SQL codes using an ETL tool in order to provide metadata tracking is also a task that requires budget, time and resources. Here again, at this point, a circumstance arises where we can find the answer to the question “why Informatica.” Informatica BDM, designed with visual interfaces such as an ETL tool, allows us to monitor the metadata of data streams. I think that this issue should be taken into consideration in order to avoid this problem in relational databases.

Technical Specifications

Some of the main features that Informatica BDM provides are as follows:

  • Ability to work with main Hadoop distributors
  • One-click Cluster connection ability
  • Development with visual interfaces
  • More than 100 ready-made transformations, connectors
  • Possibility to run by using Spark and Blaze engines
  • Possibility to use Profiling and Data Quality features
  • Mass Ingestion, dynamic mapping and “Sql to Mapping” features for maximum productivity

User Interfaces and Usage

Informatica BDM provides drag-and-drop development with eclipse based development screens.

This development screen is also the development screen of the Informatica Data Quality product. This screen, which is called “Developer Client”, records improvements to a repository built on Oracle, Mssql or DB2 relational databases in the background.
When you link to this repository:

  1. In the object explorer section on the left-hand side, you are allowed to keep and classify all objects such as your resources, goals, rules, enhancements (mappings in Informatica) with a folder structure.
  2. The right part is the area where improvements are made. In mapping in the image, two oracle tables go through some transformations and allow writing into two hive tables in the target system. As you can see, the columns in the source in this section can be clearly followed in terms of which column goes to which column in the target system.
  3. In the lower right, there is an area where you can use the features related with transformations.

As a result, it allows you to easily connect to multiple source systems via the interface and design transformations on this data.

Link and Transformation Support

Informatica BDM has a quite wide range of access to emerging technologies. As well as the technology support, the transformations to be used in mappings are at a level in a manner to meet all the developer’s needs. However, it also allows you to run the java codes that you have written with the java transformation.

Informatica, which has already been the leader in ETL area with its PowerCenter product for many years, also benefits from its experience in other products.

In addition to these transformations, Informatica BDM also allows the running of Python codes.

Running Motors

Informatica BDM is capable to run created mappings by using the power of the hadoop platform. Informatica’s own servers can also be used for this purpose.
Running motors that are supported in the current product version;

  • Spark
  • Native
  • Blaze

Informatica has developed its own running engine Blaze, in order to support all the transformations BDM provides run on Hadoop.
With Informatica’s “Smart Executor” feature you are allowed to select which starter motors you are able to run. Informatica will choose the fastest and most accurate running engine for you.

In addition to these transformations, Informatica BDM also allows the running of Python codes.

The same development in the image can be run in three different ways. As you can see below, you can drop this preference to Informatica BDM by selecting all methods on mapping.

Profiling and Data Quality

One of the most important features of Informatica BDM is that it can use the features that Informatica Data Quality provides. In addition, before developing, frequency, pattern and statistical analysis can be determined on data on profiling.
Considering that the main reasons for the failure of big data projects were the data quality problems experienced in the hadoop environments and the continuous repetition of these processes and therefore loss of time due to this reason, data analysis is of great importance in the detection of these problems.

With the help of these analytics and improvements that can be made by using transformations specific to the data quality provided by Informatica, operations such as Separation, Standardization, Enrichment, Cleanup, Mapping and Singularization can be realized.

As you can see in the example above, there is a development that checks and standardizes the field codes of Phone numbers when writing the Oracle table to HDFS.

Mass Ingestion

One of the most important features of Informatica BDM is Mass Ingestion. As a matter of fact, it provides opportunity to load tables in relational databases in masses into HDFS or Hive without requiring any development work.

For example, 1000 tables can be exported from Oracle to HDFS in 5 steps. While doing so, it provides you with the infrastructure to easily process the data in your relational database in hadoop while saving you from managing 1000 separate parameter files and so on.

Not only can you load the entire table, it also provides you the option to load the changed data.

Dynamic Mapping

Running logic of Dynamic Mapping, a feature that comes with the Informatica BDM, is based on the use of this development by developing a general mapping without making redeveloping for different resources and targets.

When get column information option is selected at the run time as below, then this enhancement is now become suitable development for dynamic mapping. Then you can parameterize the name of the source table and have these run on different tables via parameter files.

With the dynamic mapping feature, you can also make transformations on some columns according to the rules that you can you set. For example, “cleaning spaces before and after character areas”.

Sql to Mapping

As the name suggests, you can transform existing sql queries that you have directly to mapping with Informatica BDM.  It supports Insert, Update and Delete queries, however, all it needs is that the queries are in conformity with the ANSI standards.

To sum up, we can suggest that the Informatica Big Data Management is a technology that will facilitate the integration of the company into existing or new big data technologies, will allow you to make improvements that you can easily follow the impact analysis in the future and you can run all these improvements by taking advantage of the power of your big data environments.