Monday, August 26, 2013

FileEvent - Example 5 - Complex Pattern Matching

Introduction

So far the handling of patterns by the previous FileEvent examples has been useful, but probably unremarkable. As of the recently released version 1.0.1 this functionality has been extended only slightly; but with a big impact.

Sometimes it is often useful to determine the destination based on patterns from the the filename rather than just the filename itself. For example consider the following schematic naming:

<environment>.<instance>.<filename>

For example:

P.00.mytestfile
T.01.anotherfile
T.02.afile
D.00.testfile1
D.01.testfile1

In the above examples the “P” represents “production”, “T” represents “test” and “D” represents “development”. Since many organisations have multiple parallel development streams the second component of the filename indicates the instance of that environment.

Improved Macro Handling

Assuming that all incoming files land in the same directory you might want FileEvent to move files into appropriate directories for the environmetn based on the filename. With the facilties introduced so far this would be difficult; but not now with support for putting partial pattern matches into variables via the following syntax:

%{varname,pattern}

We can use this facility to break down the source filenames into different parts, for example:

<file_pattern>%{myenv_type,[PTD]}.%{myenv_inst,\d+}.%{fname,.*}.dat</file_pattern> 

We can then use that information in the target destination counter (to ensure unique counters for all files in all environments:

<target_counter>%{myenv_type}_%{myenv_inst}_%{fname}</target_counter>

And of course we use the same information to determine the suitable target directory:

<xfer_destination>/tmp/mydestdir/%{myenv_type}/%{myenv_inst}/%{fname}</xfer_destination>

Trying this altogether gives a sample configuration file:

<?xml version="1.0" standalone="yes"?> 
<FileEvent> 
  <settings> 
    <runhosts>bongo</runhosts> 
    <runusers>venture</runusers> 
    <local>*;maxfiles=10</local> 
    <db>%{ENV:FILE_EVENT_DB}</db> 
    <counters_db>%{ENV:FILE_EVENT_COUNTERS_DB}</counters_db> 
  </settings> 
  <event> 
    <send_which>newest</send_which> 
    <min_send_count>1</min_send_count> 
    <max_send_count>%{maxfiles}</max_send_count> 
    <min_age>0</min_age> 
    <description>xfer test transfer</description> 
    <directory>%{ENV:TESTFILES}</directory> 
    <file_pattern>%{myenv_type,[PTD]}.%{myenv_inst,\d+}.%{fname,.*}.dat</file_pattern> 
    <xfer_job_type>mv</xfer_job_type> 
    <target_counter>%{myenv_type}_%{myenv_inst}_%{fname}</target_counter> 
    <create_counter>true</create_counter> 
    <xfer_destination>/tmp/mydestdir/%{myenv_type}/%{myenv_inst}/%{fname}</xfer_destination> 
    <post_archive>false</post_archive> 
  </event> 
</FileEvent> 

Notice we are also making use of two additional global settings:


  • runhosts – provides a list of hosts where this can be run. Useful if the file is stored on shared storage and only particular hosts should make use of it.


  • Runusers – a list of usernames that should be able to run the configuration.


Once this is run it works just as you might example, verbose output being:

2013/08/25 23:08:14.113 0000000-I Events to load from configuratione file: 1 
2013/08/25 23:08:14.113 0000001-I Early directory pattern change for event #0: 
2013/08/25 23:08:14.113 0000002-I %{ENV:TESTFILES} => /home/venture/projects/SOURCE/fileevent/testing 
2013/08/25 23:08:14.119 0000003-I Event #0 [xfer test transfer] processing. 
2013/08/25 23:08:14.123 0000004-I Counters rename: /tmp/mydestdir/T/02/filename3 => /tmp/mydestdir/T/02/filename3.000003 + /tmp/mydestdir/T/02/filename3.000003.done 
2013/08/25 23:08:14.329 0000005-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/T.02.filename3.dat'. 
2013/08/25 23:08:14.332 0000006-I Counters rename: /tmp/mydestdir/T/01/file2 => /tmp/mydestdir/T/01/file2.000003 + /tmp/mydestdir/T/01/file2.000003.done 
2013/08/25 23:08:14.438 0000007-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/T.01.file2.dat'. 
2013/08/25 23:08:14.441 0000008-I Counters rename: /tmp/mydestdir/P/00/file1 => /tmp/mydestdir/P/00/file1.000003 + /tmp/mydestdir/P/00/file1.000003.done 
2013/08/25 23:08:14.539 0000009-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/P.00.file1.dat'. 
2013/08/25 23:08:14.543 0000010-I Counters rename: /tmp/mydestdir/D/01/myfile => /tmp/mydestdir/D/01/myfile.000003 + /tmp/mydestdir/D/01/myfile.000003.done 
2013/08/25 23:08:14.762 0000011-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/D.01.myfile.dat'. 

Notice that these files do not have dates, but date handling is just optional and if no date is present the modification time is used – though of course that is not really made use of in this example.

Sunday, August 18, 2013

FileEvent – Example 4 – Another multiple files count example

Introduction

The previous example showed how more than one matching file could be sent in a single call. The converse of this “max_send_count” option is also available – called “min_send_count”. This is used to indicate that no files should be sent unless at least the specified number of suitable files could be found.

Consider the following example configuration:

<?xml version="1.0" standalone="yes"?>
<FileEvent>
<settings>
  <db>/tmp/testing.db</db>
</settings>
  <event>
    <min_send_count>8</min_send_count>
    <max_send_count>10</max_send_count>
    <description>scp example4 transfer</description>
    <file_pattern>file1_%{4year}%{2month}%{2day}.txt</file_pattern>
   <directory>/home/venture/projects/SOURCE/fileevent/testing/example4</directory>
    <xfer_job_type>scp</xfer_job_type>
    <xfer_destination>test@lubuntu1:/tmp/%{f}</xfer_destination>
    <post_archive>true</post_archive>
  </event>
</FileEvent>

In this configuration the “min_send_count” has been set to 8 files, so if only 5 files exist then running it will actually transfer no files!

$ fileevent.pl --config example4.xml --verbose --action=process
2013/08/15 00:06:20.384 0000000-I Events to load from configuratione file: 1
2013/08/15 00:06:20.389 0000001-I Event #0 [scp example4 transfer] processing.
2013/08/15 00:06:20.390 0000002-E Maximum wait time for event #0 passed.
2013/08/15 00:06:20.390 0000003-E Only 5 files found, minimum of 8 required.
2013/08/15 00:06:20.391 0000004-E Errors encountered: 1

In such scenarios the return code is “1” indicating an error has occurred since no files could be sent.

Improving Alerting

This might be a good example scenario where you want to be alerted more directly rather than just see the process fail. This is also possible with FileEvent. For example consider the following configuration – an additional line has been added – shown in bold:


<?xml version="1.0" standalone="yes"?>
<FileEvent>
<settings>
  <db>/tmp/testing.db</db>
</settings>
  <event>
    <alert_when_missing>venture@localhost</alert_when_missing>
    <min_send_count>8</min_send_count>
    <max_send_count>10</max_send_count>
    <description>scp example4 transfer</description>
    <file_pattern>file1_%{4year}%{2month}%{2day}.txt</file_pattern>
    <directory>/home/venture/projects/SOURCE/fileevent/testing/example4</directo ry>
    <xfer_job_type>scp</xfer_job_type>
    <xfer_destination>test@lubuntu1:/tmp/%{f}</xfer_destination>
    <post_archive>true</post_archive>
  </event>
</FileEvent>

Try running the process with just 5 files available again now:

$ fileevent.pl --config example4-2.xml --verbose --action=process
2013/08/15 00:23:24.445 0000000-I Events to load from configuratione file: 1
2013/08/15 00:23:24.450 0000001-I Event #0 [scp example4 transfer] processing.
2013/08/15 00:23:24.470 0000002-I Email sent to venture@localhost (missing files).
2013/08/15 00:23:24.470 0000003-E Maximum wait time for event #0 passed.
2013/08/15 00:23:24.470 0000004-E Only 5 files found, minimum of 8 required.
2013/08/15 00:23:24.470 0000005-E Errors encountered: 1


Notice that an email notification has been sent now indicating the cause of the problem - which is often more useful than just a failure code!


Wednesday, August 14, 2013

FileEvent Example 3 - Multiple File Support

Introduction

This builds on the previous two posts regarding the FileEvent tool and builds out the example to handle multiple copies of the file.

Working with lots of files

If we ignore that sent files shown previously and start from a clean configuration again with a directory containing the following files in this case:

-rw-rw---- 1 venture venture 488 Aug 12 22:30 example2.xml
-rw-rw-r-- 1 venture venture 25944 Aug 12 22:52 file1_20120723.txt
-rw-rw-r-- 1 venture venture 25944 Aug 12 22:53 file1_20130201.txt
-rw-rw-r-- 1 venture venture 27125 Aug 12 22:28 file1_20130819.txt
-rw-rw-r-- 1 venture venture 27125 Aug 12 22:28 file1_20130821.txt
-rw-rw-r-- 1 venture venture 27125 Aug 12 22:28 file1_20130822.txt

What if you want to process all matching files in a directory and don't want to call “fileevent.pl” multiple times until no files are sent? This is easy – a simple addition to the configuration file:

<?xml version="1.0" standalone="yes"?>
<FileEvent>
 <settings>
  <db>/tmp/testing.db</db>
 </settings>
 <event>
  <max_send_count>10</max_send_count>
  <description>scp example1 transfer</description>
  <file_pattern>file1_%{4year}%{2month}%{2day}.txt</file_pattern>
  <directory>/home/venture/projects/SOURCE/fileevent/testing/example3</directory>
  <xfer_job_type>scp</xfer_job_type>
  <xfer_destination>test@lubuntu1:/tmp/%{f}</xfer_destination>
  <post_archive>true</post_archive>
 </event>
</FileEvent>

The “max_send_count” means send between 1 and this number of files depending on what matches. Running the utilty is “list” mode using the command gives:

$ fileevent.pl --config example3.xml --action=list

Event #0 matched files:
Directory: /home/venture/projects/SOURCE/fileevent/testing/example3
Pattern : ^file1_%{4year}%{2month}%{2day}.txt$
Actual cut-down list to send now:
file1_20130822.txt
file1_20130821.txt
file1_20130819.txt
file1_20130201.txt
file1_20120723.txt

And running it in process mode (with verbose output):

$ fileevent.pl --config example3.xml --action=process --verbose

2013/08/12 23:05:24.196 0000000-I Events to load from configuratione file: 1
2013/08/12 23:05:24.217 0000001-I Event #0 [scp example1 transfer] processing.
2013/08/12 23:05:24.402 0000002-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130822.txt'.
2013/08/12 23:05:24.405 0000003-I Directory to archive to [/home/venture/projects/SOURCE/fileevent/testing/example3/archive] does not exist - attempting create,
2013/08/12 23:05:24.405 0000004-I Archive directory created successfully.
2013/08/12 23:05:24.406 0000005-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130822.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example3/archive/file1_20130822.txt.000000.gz'.
2013/08/12 23:05:24.596 0000006-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130821.txt'.
2013/08/12 23:05:24.600 0000007-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130821.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example3/archive/file1_20130821.txt.000000.gz'.
2013/08/12 23:05:24.841 0000008-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130819.txt'.
2013/08/12 23:05:24.845 0000009-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130819.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example3/archive/file1_20130819.txt.000000.gz'.
2013/08/12 23:05:25.036 0000010-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130201.txt'.
2013/08/12 23:05:25.047 0000011-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20130201.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example3/archive/file1_20130201.txt.000000.gz'.
2013/08/12 23:05:25.238 0000012-I Successfully sent '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20120723.txt'.
2013/08/12 23:05:25.242 0000013-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example3/file1_20120723.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example3/archive/file1_20120723.txt.000000.gz'.


So all 5 files have been sent in a single call, all in the order based on the date from the parsed file names.

Monday, August 12, 2013

FileEvent Example 2 - Basic Archiving Support

Introduction

FileEvent is a utility designed to allow event-driven handling of files when they come into existence based on simple rules stored in XML configuration files. This is the second example of how FileEvent might be used and extends the facilities shown based on the initial example.

Why Archive?

In the previous example it was shown how FileEvent is stateful in that (by default) will not process the same file twice. This means that when multiple files are matched in a directory and not all files are processed it will “skip” the files in question, for example re-capping the previously shown command:

2013/08/12 22:25:23.443 0000013-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130822.txt' already sent - removing from list to send or fetch.
2013/08/12 22:25:23.443 0000014-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130821.txt' already sent - removing from list to send or fetch.
2013/08/12 22:25:23.443 0000015-D Actual cut-down list to send now:
2013/08/12 22:25:23.443 0000015-D file1_20130819.txt
2013/08/12 22:25:23.443 0000016-D send_file of '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130819.txt' called.
2013/08/12 22:25:23.444 0000017-D xfer_destination raw: test@lubuntu1:/tmp/%{f}
2013/08/12 22:25:23.444 0000018-D xfer_destination cooked: test@lubuntu1:/tmp/file1_20130819.txt


This is all well and good, but this scanning the files and reordering them is additional processing which is best avoided, and hence we come to another feature of FileEvent – archiving. 

FileEvent Archiving

Archiving will move any successfully sent files to an archive directory and compress it to, saving space. Just a single additional line is usually enough, for example:

<?xml version="1.0" standalone="yes"?>
<FileEvent>
<settings>
  <db>/tmp/testing.db</db>
</settings>
<event>
  <description>scp example1 transfer</description>
  <file_pattern>file1_%{4year}%{2month}%{2day}.txt</file_pattern>
  <directory>/home/venture/projects/SOURCE/fileevent/testing/example2</directory>
  <xfer_job_type>scp</xfer_job_type>
  <xfer_destination>test@lubuntu1:/tmp/%{f}</xfer_destination>
  <post_archive>true</post_archive>
</event>
</FileEvent>

Notice the bolded line. Now look what happens when a file is sent in this instance (a sub-set of output since we are running with --debug):

$ fileevent.pl --action=process --debug --config example2.xml

2013/08/12 22:47:38.575 0000014-D send_file of '/home/venture/projects/SOURCE/fileevent/testing/example2/file1_20130822.txt' called.
2013/08/12 22:47:38.575 0000015-D xfer_destination raw: test@lubuntu1:/tmp/%{f}
2013/08/12 22:47:38.576 0000016-D xfer_destination cooked: test@lubuntu1:/tmp/file1_20130822.txt
2013/08/12 22:47:38.761 0000017-D Archiving file in directory '/home/venture/projects/SOURCE/fileevent/testing/example2/archive'.
2013/08/12 22:47:38.761 0000018-I Directory to archive to [/home/venture/projects/SOURCE/fileevent/testing/example2/archive] does not exist - attempting create,
2013/08/12 22:47:38.761 0000019-I Archive directory created successfully.
2013/08/12 22:47:38.763 0000020-I File archived '/home/venture/projects/SOURCE/fileevent/testing/example2/file1_20130822.txt' as '/home/venture/projects/SOURCE/fileevent/testing/example2/archive/file1_20130822.txt.000000.gz'.

Notice that it automatically created a directory called archive and once the file had been sent to the remote destination it moved and compressed the file to that new directory.

Also note the filename FileEvent uses – it has added a “.000000” before the “.gz” compression extension. This is a feature that ensures that even if you process the same named file (which must be explicitly indicated), it will still be given a unique name in the archive directory to ensure any file previously sent and residing in the archive directory is not over written.

As before one file is processed for each run time until all files have been processed.

Saturday, August 10, 2013

FileEvent Example 1 (Simplest use of FileEvent)


Introduction


FileEvent is one of those tools that you really need to see in action rather than just reading about it; the purpose is hard to explain, but easy to comprehend once presented with a few examples.

Firstly consider files with the following naming scheme:

file1_YYYYMMDD.txt

where:

  • YYYY – 4 Digit year
  • MM – 2 Digit month
  • DD – 2 Digit day

The simplest way of getting FileEvent to notice such files is a configuration such as the following:

<?xml version="1.0" standalone="yes"?>
<FileEvent>
  <settings>
    <db>/tmp/testing.db</db>
  </settings>
  <event>
    <description>scp example1 transfer</description>
    <file_pattern>file1_%{4year}%{2month}%{2day}.txt</file_pattern>
    <directory>/home/venture/projects/SOURCE/fileevent/testing/example1</directory>
    <xfer_job_type>scp</xfer_job_type>
    <xfer_destination>test@lubuntu1:/tmp/%{f}</xfer_destination>
  </event>
</FileEvent>

The key points to notice here are:


  • The “directory”is the directory where the files exist, and the “file_pattern” is the name of the files to look for.


  • The “file_pattern” can make use of “%{value}” strings to represent patterns to search for – in this case parts of a date – such as “%{year}”.


Consider a directory with the following files in it:

-rw-rw-r-- 1 venture venture 27125 Aug 10 15:07 file1_20130819.txt
-rw-rw-r-- 1 venture venture 27125 Aug 10 15:07 file1_20130821.txt
-rw-rw-r-- 1 venture venture 27125 Aug 10 15:07 file1_20130822.txt

Running FileEvent is “list” mode will just list the files it will process for example:

$ fileevent.pl --config example1.xml --action=list 
Event #0 matched files:
  Directory: /home/venture/projects/SOURCE/fileevent/testing/example1
  Pattern  :  ^file1_%{4year}%{2month}%{2day}.txt$
Actual cut-down list to send now:
  file1_20130822.txt

By default it will only process a single file for the matching pattern, and also by default it will only match the “newest” file.

Notice that the newest file is not based on the modification time of the file, but from the date taken from the patterns of dates from the file – if such is available.

So to actually send the file:

$ fileevent.pl --config example1.xml –action=process

That will send the first file (though without verbosity not much will have been printed).

Notice that the file will still exist in the directory. Now run the same command again:

$ fileevent.pl --config example1.xml –action=process
2013/08/10 17:10:44.951 0000013-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130822.txt' already sent - removing from list to send or fetch.

This is one of features of FileEvent – it will (by default) not send the same file if it has already been sent successfully – so repeating the command has skipped the first file and sent the next one (based on file age again).

Run it again and it will skip the two files this time:

$ fileevent.pl --config example1.xml --action=process --verbose
2013/08/10 17:14:45.021 0000000-I Events to load from configuratione file: 1
2013/08/10 17:14:45.025 0000001-I Event #0 [scp example1 transfer] processing.
2013/08/10 17:14:45.026 0000002-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130822.txt' already sent - removing from list to send or fetch.
2013/08/10 17:14:45.026 0000003-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130821.txt' already sent - removing from list to send or fetch.

So all three files have been sent at this point. Running it again will send no files:

fileevent.pl --config example1.xml --action=process
2013/08/10 17:15:25.681 0000000-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130822.txt' already sent - removing from list to send or fetch.
2013/08/10 17:15:25.681 0000001-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130821.txt' already sent - removing from list to send or fetch.
2013/08/10 17:15:25.682 0000002-W File '/home/venture/projects/SOURCE/fileevent/testing/example1/file1_20130819.txt' already sent - removing from list to send or fetch.
2013/08/10 17:15:25.682 0000003-E Maximum wait time for event #0 passed.
2013/08/10 17:15:25.682 0000004-W ** No matching files found. **
2013/08/10 17:15:25.682 0000004-W  

Conclusion

This has been a very simple example of FileEvent showing just some of the basic features. It should be obvious now that the pattern matching is easy to use, the configuration files are easy to understand and the utility is stateful (in as much as it will not repeatedly send the same thing every time it is called).

FileEvent Available

I've now made FileEvent available on Google Code too - see https://code.google.com/p/fileevent/. This is a utility that is very useful - but hard to explain. Hence I'm going to post some usage examples on the blog over the next few days to give you some ideas.

The code is available from freecode.com - via this link.