Organising Your Projects

Introduction

Now that you’ve installed the Framework and worked through the tutorial My First Framework Job, we’ll now look at how you might organise the structure of your real-world projects.

How you structure your projects will depend on how your Organisation uses Talend, and how many discreet projects you have. In this example, we’ll be as flexible as possible, whilst still keeping things simple.

We’ll often talk about Context, Context Groups, Context Files and Context Values. These are core to building a successful project, and this simply boils down to sensibly maintaining your runtime parameters. There’s nothing worse than having hundred of Jobs that all manage this differently, including hard-coded parameters that you thought would never change. If you’re already working this way, you’ll know how painful it can be, especially when you need to modify a Job that was written two years ago, by someone who has now left the organisation. You promote changes to Production, with your fingers crossed.

For the purpose of this tutorial, we’ll refer to Context, more generically, as the Environment – the environment where you will run your Jobs – Development, Test, UAT or Production. In the most part, Environment is synonymous with Context.

It is worth considering, while reading this tutorial, that is it possible that you’ll be running each of your Environments on different Server hardware. The Framework is agnostic to your own specific hardware set-up and this is outside the scope of this tutorial.

Create Your First Talend Project

In this example, we’re going to create a Data Migration project for an Organisation named Acme. In this fictitious project, we’re going to migrate data from their MySQL based CRM system, to Salesforce.

Feel free to substitute your own Organisation and Project Name. Make the Project Name as meaningful as possible. It is never too early to start thinking about your naming standards.

Basic Project Creation

If you haven’t already done so, read the article Getting Started. This article give you a quick-step guide to creating a basic Talend Project, for use by the Framework.

A word on default vs. “Default: Talend provides, out-of-the-box, a single Context named “Default“, and it is the default Context. The Talend Framework, whilst retaining the Context named “Default“, introduces some additional Contexts, with the Context named “Dev” becoming the default. The Talend Framework uses the Context named “Default” for Framework Testing and templating, as well as providing Context values that are valid across the Organisation, Environments, Projects and Jobs; as discussed within this tutorial.

Create a new Talend Project named MyCRMToSalesforce.

Configure your project’s properties, as described in the article Project Configuration.

The article Installation Testing recommends that, when running the Job named Template, you run the Job in the Context named “Default“, rather than the default Context named “Dev“. Following this recommendation will help you to work through this tutorial.

Import and test the Talend Framework. You should already be familiar with these steps, if you have read the article Getting Started.

Review Context Groups

For this tutorial, we will modify the Base Directory (baseDir), to identify our Organisation Name. This is simply to allow us to have all of our work in an easily identifiable directory structure. For convenience, we’ll keep this under our #HOME directory. Your organisation may have its own directory structure conventions, so feel free to modify this path, accordingly.

Not all File System directories need to reside within the Base Directory. You may assign these as you wish. You may, for example, have a specific location where you want to put your log files, and another for your archived files. The changes that we are making here, are simply indicative of the types of changes that you may make.

Edit the File System Context Group, within the Talend Studio Repository, and make the following change (substituting your own Organisation Name, as you choose).

baseDir= #HOME/Acme/project/talend/#CONTEXTSTR/#PROJECTNAME/#JOBNAME

As you’re modifying a Framework Context Group, you must remember to re-apply your changes, should you install a later version of the Talend Framework. You should never have need to modify any values within the File System Group other than baseDir or statsDir. There is an alternative to this approach, which we’ll discuss in a subsequent article.

Run the Template Job in both the “Default” and “DevContexts. Take an opportunity to look at the File System directories (under #HOME/Acme), and files that have been created.

By running the Template Job in the “DefaultContext, and maintaining this Job, you are also creating and maintain the template files on your File System.

Template Context Files

A Number of template Context Files have been provided under the Repository folder: –

Repository->Documentation->Framework->Context

We will now create our initial Context Files that will support development for our Organisation-wide, Project, and Job environments.

When we first ran the Template Job using the “DefaultContext a default Context File was created as:-

#HOME/Acme/project/talend/Default/MYCRMTOSALESFORCE/Template/context/Default.Template.context

Save the following file,  overwriting this auto-generated file.

Documentation->Framework->Context->Default.Template.Context

Now load this newly created file in to a Text Editor. You will see that there are three autoload instructions. For more information on auto-loading Context Files, read the article on the Context Loader.

Amend the three autoload instructions, in a similar manner to how you previously modified baseDir i.e. adding an Organisation Name to the path. Add your own comments to this file. This file will, ultimately, form the basis for your own default Context Files, each time you create a new Job.

autoload="#HOME/Acme/project/talend/Default/context/Default.Framework.context"
autoload="#HOME/Acme/project/talend/Default/context/Default.FileSystem.context"
autoload="#HOME/Acme/project/talend/Default/context/Default.SendMail.context"

You have now created your first template Context File, so we’ll save this to the Repository.

Create the following folder, for your own custom Context Files.

Documentation->Context->Custom

Now add your newly created Context File to this folder.

#HOME/Acme/project/talend/Default/MYCRMTOSALESFORCE/Template/context/Default.Template.context

We’ll now use this new custom Context File for our Template Job that will run in the “DevContext. Save this new Repository Context File to (overwriting the auto-generated file that already exists): –

#HOME/Acme/project/talend/Dev/MYCRMTOSALESFORCE/Template/context/Dev.Template.context

The default Context Files in the Repository are all prefixed with the “Default” Context. You will be saving this file in to the “Dev” Context and you will need to change the prefix accordingly.

Now create the directory identified by the three autoload files, for example :-

#HOME/Acme/project/talend/Default/context

Note that this directory named “context” is immediately below the directory for the Context named “Default”. This indicates that Context Files placed within this directory are available for use by Jobs in all Projects and across all Environment. Any Job should be able to fall-back on the parameters that have been specified within these Context Files, with impunity. Should this rule become untrue for any parameter, then that parameter must be promoted to a higher-priority level.

Save the following files into this new directory: –

Documentation->Framework->Context->Default.Framework.context
Documentation->Framework->Context->Default.FileSystem.context
Documentation->Framework->Context->Default.SendMail.context

You can now test the changes made, by running the Template Job in the “DevContext.

Running the Template Job is always a good way to test the sanity of your installation and changes.

Conclusion

We’ve now completed our initial Project set-up which will stand us in good stead for building multiple Projects with any number of Jobs that will have simple Configuration Management, together with a consistent architecture.

In this tutorial, we have: –

  • Configured our Template Job that will run in both the “Default” and “DevContexts.
  • The knowledge to promote a Job through Test, UAT & Production.
  • Context Files that may be shared across the Organisation, Environments and Projects.
  • The knowledge to promote parameters (Context Values) to higher-priority levels.
  • Out first custom Context File, stored in the Repository, for use by all new Jobs.