My First Framework Job

A Simple Job

Now that you’ve successfully Installed and Tested the Talend Framework, it’s time to create a simple Job. This will show how easy it is to integrate your own work within the Talend Framework. To start with, you’ll create your own Framework Job that calls an example task Job.

Template Job Overview

The smallest Framework Job consists of 3 Talend Jobs – the executable Job, the Job that controls process, and the Job that does your own work. You create these Jobs from templates.

By default, the framework supports a fourth Job that manages orchestration. For this simple example, we will not be using an orchestration Job.

As a quick overview, you’ll find all of the Template Jobs in the Repository folder: –

Repository->Job Designs->Template

We’ll be using three Template Jobs and we’ll explore their function as we work through this tutorial. When you’ve completed this tutorial, you can read about Template Jobs in more depth, here.

The three Template Jobs are: –

Framework Jobs

At first glance, Job: Template and Job: TemplateProcess may look complex. You should remember that, under normal usage, you do not modify these Jobs other specifying the single child Job that they should call.

Creating your first Framework Job

You’ll can create your new Job.

Repository Folders

I always recommend the use of folders, to maintain a well-organised Repository. You will find all of the Talend Framework Items in folders named Framework.

First of all, create the following folders in your Project Repository.

Repository/Job Designs/MyJob
Repository/Job Designs/MyJob/Tasks

Example Task Job

Normally, when duplicating the Template Jobs, you would use the Job TemplateTask for the basis of your own work. In this example, we’re going to use the sample Task Job, GenSampleDataTask instead.

Of course, once you’ve started to create your own Jobs within the Framework, you will more than likely be duplicating your own pre-configured Template Jobs rather than always returning to the Framework’s Master Templates.

This example Job can be found in the folder: –

Repository/Job Designs/Framework/Examples/GenSampleData/Tasks

Duplicating the Template Jobs

Duplicate the Template Jobs, as shown below.

Template -> MyJob
TemplateProcess -> MyJobProcess
GenSampleDataTask -> MyJobTask

Move Jobs to their Correct Folders

You can now move your newly created Jobs to their correct folders, as shown below.

MyJob -> Job Designs/MyJob
MyJobProcess -> JobDesigns/MyJob
MyJobTask -> JobDesigns/MyJob/Tasks

Your repository should now look similar to the screenshot, below.

MyJob
MyJob

Configure new Jobs

We’ll now open and modify these new Jobs, starting with MyJob and MyJobProcess.

MyJob is the launch Job that handles basic initialisation, reporting, and top-level error handling. This Job hands-over control to MyJobProcess, which then controls the individual Jobs that will perform your own tasks.

You’ll notice that the version numbers of the Template Jobs are version 1.0. These will not change, with new releases of the Framework. These are the starting version numbers for your own Jobs, not the version number of the Framework.

A Note on Colour Coding

You’ll notice that the Subjobs, within these Template Jobs, are colour-coded. We talk more about this in Template Jobs. For now, it’s worth noting that red Subjobs should not be modified, yellow Subjobs are placeholders for adding your own logic, and grey Subjobs are usually replaced with your own Subjobs and Components.

It is good practice to modify the colour of Yellow Subjobs to another colour, for example, Orange, should you add your own code or reconfigure a Component. This colour-change is a useful indicator, should you wish to upgrade the Job to a new Framework version at a later date.

Configure MyJob

The only change that we’ll make to MyJob, is to call the Child Job MyJobProcess, rather than TemplateProcess. We’ll also change the colour of the Subjob, as described above. Once you’ve made this change, save the Job.

The following screenshots shows the before and after images, of the newly-configured Subjob. To find the location of the Subjob that needs changing, just look for the yellow Subjobs.

Remember, you do not need to delete the tRunJob (TemplateProcess) component, to call your new Job. Simply select the component, go to the Component properties tab, and select MyJobProcess.

MyJob Job Change, Before
MyJob Job Change, Before
MyJob Job Change, After
MyJob Job Change, After

Configure MyJobProcess

The only change that we’ll make to MyJobProcess, is to call the Child Job MyJobTask, rather than TemplateOrchestrate. We’ll also change the colour of the Subjob, as described above. Once you’ve made this change, save the Job.

The following screenshots shows the after image, of the newly-configured Subjob. To find the location in the Subjob that needs changing, just look for Yellow Subjobs.

MyJobProcess Job Change, After
MyJobProcess Job Change, After

Running your new Job

And that’s it, you have created a new Job that will run within the Talend Framework, and that will benefit from all of the functionality that the framework provides.

Your new task Job, MyJobTask, will create a CSV file of generated data. Once you have completed this tutorial, take the opportunity to review this Job. This will  help you to understand how your own work will interact with the framework and, also, show you how little impact the framework will have on how you want to work.

Run your new Job named MyJob.

If you have correctly built your Job, a dialog will be displayed, requesting you to enter the number of rows that should be generated. Enter as many or as few rows as you wish and then hit OK.

Choose Number of Rows to Generate

On successful execution, your Job will tell you the location of the generated file, as shown in the following screenshot. Hit OK, and then take the opportunity to examine this file either in a text editor or, ideally, open it as a spreadsheet. We’ll talk more about this Job and its output, later in this tutorial.

If you do not have Microsoft Excel, there are a number of Open Source products available, including the Excellent Libre Office.

Location of Your Generated File
Location of Your Generated File
Sample Output
Sample Output

Also, on successful execution, you should see console output that is similar to the output shown below. For brevity, I have not shown the table labelled Installation Configuration which is only shown on the first time that you run your Job.

Starting job MyJob at 08:58 26/11/2016.

[statistics] connecting to socket on port 4067
[statistics] connected
MyJob: 1.0: STARTED: Sat Nov 26 08:58:35 GMT 2016
MyJob: 1.0: initialised Framework version 1.7
MyJob: 1.0: this Job was built with Framework version 1.7
LibContextReader: primary context file is C:\Users\alan\project\talend\Dev\FRAMEWORK/context/Dev.MyJob.context
LibContextReader: C:\Users\alan\project\talend\Dev\FRAMEWORK/context/Dev.MyJob.context does not exist and will be created
...
LibManageLockFile: 1.7: STARTED: Sat Nov 26 08:58:35 GMT 2016
LibManageLockFile: 1.7: ENDED: Sat Nov 26 08:58:35 GMT 2016
MyJobProcess: 1.0: STARTED: Sat Nov 26 08:58:35 GMT 2016
MyJobProcess: 1.0: this Job was built with Framework version 1.7
LibEnv: 1.5: STARTED: Sat Nov 26 08:58:35 GMT 2016
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\checkpoint
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\input
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\lib
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\output
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\reporting/MyJob.20161126.085835
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\working
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\tmp
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\log
LibEnv: 1.5: ENDED: Sat Nov 26 08:58:35 GMT 2016
MyJobTask: 1.0: STARTED: Sat Nov 26 08:58:35 GMT 2016
MyJobTask: 1.0: ENDED: Sat Nov 26 08:58:40 GMT 2016
LibPruneFileSystem: 1.6: STARTED: Sat Nov 26 08:58:40 GMT 2016
LibPruneFileSystem: Archive Directory pruning is not active
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\reporting for directories older than Sun Aug 28 08:58:35 BST 2016 (90 days)
LibPruneFileSystem: Output Directory pruning is not active
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\working for files older than Sun Aug 28 08:58:35 BST 2016 (90 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\tmp for files older than Sat Nov 19 08:58:35 GMT 2016 (7 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\log for files older than Thu Oct 27 08:58:35 BST 2016 (30 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\stats for files older than Thu Oct 27 08:58:35 BST 2016 (30 days)
LibPruneFileSystem: 1.6: ENDED: Sat Nov 26 08:58:40 GMT 2016
MyJobProcess: deleting temporary framework files
MyJobProcess: 1.0: ENDED: Sat Nov 26 08:58:40 GMT 2016
LibJobReporting: 1.8: STARTED: Sat Nov 26 08:58:40 GMT 2016
LibJobReporting: 1.8: this Job was built with Framework version 1.7
LibJobReporting: 1.8: ENDED: Sat Nov 26 08:58:41 GMT 2016
MyJob: 1.0: ENDED: Sat Nov 26 08:58:41 GMT 2016
[statistics] disconnected
Job MyJob ended at 08:58 26/11/2016. [exit code=0]

Success!

If your file was successfully generated, you saw console output that is similar to the output shown above, and the Job exited with code=0, your new Job has completed successfully. Let’s now take a look what the new Job did, in addition to generating a CSV file.

 Didn’t Work?

If things did not go according to plan, please check any reported errors, then review and fix. If you are unable to fix the errors, please read the article Installation Testing.

If you are still unable to resolve the issue, remember that Free Support is always available.

 MyJobTask

As we’ve already discussed, MyJobTask will create a file of generated data. If you have not already done so, take the opportunity to look at this Job and view its output, as discussed earlier.

The following screenshot shows the main Subjob – the one generates the data and writes the file.

MyJobTask, Main Subjob
MyJobTask, Main Subjob

As you can see from the above screenshot, implementing some custom processing, in to the Framework, can be very straightforward. I’ll leave you to explore what the components tRowGenerator (GenPerson) and tMap (MapPerson) do.

You’ll notice that the flow, named person, is monitored. You can read how the Framework’s Job Execution Reporting manages the reporting of these statistics, in this article.

Framework Activity

The remainder of this tutorial will discuss the activity that has occurred, in addition to the specific task that we wanted to do, namely, create a file of generated data.

A number of things have happened, during the first execution of your new Job. Let’s look at these in some detail.

Job Start and Finish

Each Job records it’s start and finish time, to the console.

MyJob: 1.0: STARTED: Sat Nov 26 08:58:35 GMT 2016
...
MyJob: 1.0: ENDED: Sat Nov 26 08:58:41 GMT 2016

Framework Initialisation

The Framework will be initialised, with the version number being reported to the console.

MyJob: 1.0: initialised Framework version 1.7

When a Job that was built using an older version of the Framework starts, this fact will be recorded to the console. You will only see this if you have upgraded the Framework to a later version. All Framework updates will be compatible with Jobs that were built using an older version. You may choose to update your Job, to benefit from new functionality.

Context

The execution of your Job, is Context-driven, and controlled by a sophisticated Context Loader. This ensures that each of your Jobs will require minimal configuration, whilst also simplifying the deployment process from Development, through to Test and Production.

You’ll see from the console log that, as this is the first time that the Job has been executed, a Context file has been created. You’ll also see that this Job has been executed using the Default Context. For information on changing the Context, read this article.

LibContextReader: primary context file is C:\Users\alan\project\talend\Dev\FRAMEWORK/context/Dev.MyJob.context
LibContextReader: C:\Users\alan\project\talend\Dev\FRAMEWORK/context/Dev.MyJob.context does not exist and will be created

When you ran your own Job, you’ll have noticed that an Installation Configuration table was displayed. I removed this from the above console output, for brevity.

This table shows all of the Default Context Values. These values were written to the Context file when it was created; but are commented-out. This allows easy modification of these values, should you choose to modify them at individual Job-level.

Lock File Management

Lock file management allows an optional lock file to be created for the duration of your Job’s execution. For more information o lock files, read the article on Lock file management.

LibManageLockFile: 1.7: STARTED: Sat Nov 26 08:58:35 GMT 2016
LibManageLockFile: 1.7: ENDED: Sat Nov 26 08:58:35 GMT 2016

Environment

You’ll also see from the console output that a number of directories have been created. The Framework will ensure that all essential and commonly used directories are in place.

For this first Job, you will see that default locations that are used. More information on these locations, and how they may be changed, can be found in the article Framework Context Groups.

LibEnv: 1.5: STARTED: Sat Nov 26 08:58:35 GMT 2016
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\checkpoint
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\input
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\lib
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\output
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\reporting/MyJob.20161126.085835
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\working
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\tmp
LibEnv: creating directory C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\log
LibEnv: 1.5: ENDED: Sat Nov 26 08:58:35 GMT 2016

 Task Execution

The console log will record, as with all other activity, the execution of MyJobTask. You may, of course, add your own activity logging to your work.

MyJobTask: 1.0: STARTED: Sat Nov 26 08:58:35 GMT 2016
MyJobTask: 1.0: ENDED: Sat Nov 26 08:58:40 GMT 2016

File System Management

File system management will be recorded to the console log, as File System pruning. This pruning is fully configurable and works in conjunction with file archiving.

LibPruneFileSystem: 1.6: STARTED: Sat Nov 26 08:58:40 GMT 2016
LibPruneFileSystem: Archive Directory pruning is not active
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\reporting for directories older than Sun Aug 28 08:58:35 BST 2016 (90 days)
LibPruneFileSystem: Output Directory pruning is not active
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\working for files older than Sun Aug 28 08:58:35 BST 2016 (90 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\tmp for files older than Sat Nov 19 08:58:35 GMT 2016 (7 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\log for files older than Thu Oct 27 08:58:35 BST 2016 (30 days)
LibPruneFileSystem: pruning C:\Users\alan\project\talend\Dev\FRAMEWORK\MyJob\stats for files older than Thu Oct 27 08:58:35 BST 2016 (30 days)
LibPruneFileSystem: 1.6: ENDED: Sat Nov 26 08:58:40 GMT 2016

Files and Directories

Let’s now look at the files and directories that have been created, including the output of the actual task that we asked the Job to perform.

Remember, these are all default directory paths. You can change these to suit your own requirements.

MyJob, File System Output
MyJob, File System Output

The project directory is, by default, stored under the user’s home directory.

For detailed information on the directories that have been created, read this article on Framework Context Groups.

For now, we’ll just look at the directories that contain files.

context Directory

The context directory contains the Context file that was generated when we first ran this new Job. Look in this directory and you will see the file: –

Default.MyJob.context

If you open this file you will see all of the Context values that are managed by the Framework, and that you are able to modify. You will see that they are all commented-out and showing the default values. You are able to un-comment these, and set your own values.

You do not need to always configure Context at Job-level. This should be the exception. For more information, read this article.

output Directory

The output directory contains the product of our work. it contains a single Spreadsheet , as shown below.

reporting Directory

The reporting directory now holds a single time-stamped directory that contains the execution-specific reporting data that was automatically captured by the Framework. You may choose to place your own execution-specific reporting files within this same time-stamped directory.

MyJob.20161126.085835

As with the Archive Directory, the Reporting Directory will contain execution-specific sub-directories. Although it is recommend that you retain this structure, you may disable this feature if you choose. For more information, read this article on Framework Context Groups.

Job Execution Report

The execution-specific Reporting Directory now holds a single Spreadsheet that contains the monitored statistics that were collected by the Framework.

MyJob.20161126.085813.reporting.xls
MyJob, Job Execution Reporting
MyJob, Job Execution Reporting

You’ll find more information on Job Execution Reporting, in this article. For now, we can clearly see that the number of records that were written to the Spreadsheet, has been recorded.

If you also look in the lib directory, you’ll also see the template Job Execution Reporting specification that was automatically generated, for this Job.

MyJob.JobReporting.xls

Conclusion

We’ve now walked through a fairly simple Job that performs a real task. This should show you how easy it is to implement your own tasks within the Framework, and that the Framework does not get in your way or impose unnecessary restrictions.

Try modifying the Job MyJobTask, for example, sorting the data before writing the Spreadsheet. Experimenting with this Job will help you to understand how you can interact with the features of the Framework.