ETL Overview
Credit card manufacturer job change data integration using Pentaho
The data flow shown above is an overview of a credit card manufacturer's job change data integration. When the credit card company finalizes a new job contract, it is added to the job fact table in the data warehouse. In addition, the lead fact table is updated, and all data validation errors are logged. Figure 2, shown below, is the ETL process in its entirety.
The data flow below, figure 3, starts with a connection to our change data table, table 1. All rows with null values are filtered out and recorded in the error log table, table 2. For all logged errors, a database sequence generates the log_id. Next, a string is added to specify the error, and then the error log is inserted into the table in the data warehouse.
Next, each date in the change data is split by year, month and day, and checked for validity against the time table. All rows with non-valid dates are filtered out and recorded in the error log. The validation process is shown in figure 3, and the error log entry is shown in figure 5.
The date filter, figure 3, checks that the promise date is between 14 and 30 days after the contract date and that the promise is between 2 and 7 days after the ship date. All rows that do not meet this validation are removed and recorded in the error log.
Figure 4, shown above, represents the foreign key validation of all dimension table IDs in the change data. All non-valid IDs are recorded in the error log, figure 5. Figure 5: FK and date error logs
In figure 6, the job_id is created with a database sequence, and the valid change data is inserted into the job fact table. Then a column "success" is created with the value "Y." The lead data is validated and confirmed to be less than the contract date, else it is put into the error log. Lastly, the lead fact table is updated.