File System Pruning

As of version 1.4, the Talend Framework supports automatic pruning of certain directories that are managed on the File System.

Pruning is the physical deletion of directories and files that are older than a specified number of days.

Pruning is controlled by a number of parameters that are defined in the Framework Context Groups, as detailed in the table below.

All Framework Jobs have the ability to prune the File System.

Configuration

DirectoryRetention ParameterDefault Days
archiveDirarchiveDirRetentionDays-1
logDirlogDirRetentionDays90
statsDirlogDirRetentionDays90
outputDiroutputDirRetentionDays-1
reportingDirreportingDirRetentionDats90
tmpDirtmpDirRetentionDays7
workingDirworkingDirRetentionDays90

Both logDir and statsDir are controlled by the parameter logDirRetentionDays.

Localised Directories

Both the Archive Directory (archiveDir) and the Reporting Directory (reportingDir) are, be default, localised (see localiseArchiveDir and localiseReportingDir for more information).

Localisation means that any any files that are written by your Job, will always written to a sub-directory, which identifies the Job name and the date and time that the Job was executed, for example: –

reporting/MyJob.20160129.1600

Pruning Localised Directories

When localised directories are pruned, the entire directory and all of its content will be deleted, regardless of the age (modification timestamp) of the directory content.

Pruning non-Localised Directories

When pruning non-localised directories, individual files are deleted, based their age (modification timestamp).

No sub-directories will be deleted from non-localised directories, so you may create your own sub-directories within these structures without risk of them being deleted; however, files within these sub-directories may be deleted.

Pruning Competition

By default, both the archive directory and reporting directory are localised, and each Job will have its own output, working and temporary directories. This means that there will be no competition between Jobs attempting to prune each other’s files. If you do alter the default configuration and this situation arises, the pruning of some files may be reported by multiple Jobs; however, this should have no adverse effect on the execution of your Job.

The default for the Log Directory is that all jobs will share this directory, so there may be competition.

Where pruned directories are shared between Jobs, for example the Log Directory, you may choose to independently prune these directories using the Job PruneFileSystem, rather than allowing your individual Jobs to perform this activity.

 Jobs Designs->Framework->Utility->PruneFileSystem

Pruning Report

A report of all directories and files that were scheduled for pruning, will be written to a pruning report file, for example: –

reporting/PruneFileSystem.20160129.1600/PruneFileSystem.20160129.1600.files.scheduled.for.pruning.csv