<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MediaBandit Ltd Blog</title>
	<atom:link href="http://www.mediabandit.co.uk/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.mediabandit.co.uk/blog</link>
	<description>randoms bits from us to you</description>
	<lastBuildDate>Fri, 10 May 2013 09:57:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>May 2013: MediaBandit Launches Consumer Trends Blog</title>
		<link>http://www.mediabandit.co.uk/blog/278_may-2013-mediabandit-launches-consumer-trends-blog</link>
		<comments>http://www.mediabandit.co.uk/blog/278_may-2013-mediabandit-launches-consumer-trends-blog#comments</comments>
		<pubDate>Fri, 10 May 2013 09:57:16 +0000</pubDate>
		<dc:creator>mbcontent</dc:creator>
				<category><![CDATA[Media Bandit Projects]]></category>
		<category><![CDATA[BLOG]]></category>
		<category><![CDATA[BLOGSHOPBUY]]></category>
		<category><![CDATA[BUY]]></category>
		<category><![CDATA[product launch]]></category>
		<category><![CDATA[SHOP]]></category>
		<category><![CDATA[SHOPPING WEBSITE]]></category>
		<category><![CDATA[TRENDS]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=278</guid>
		<description><![CDATA[Tapping into the internet shopping generation, MediaBandit has launched a trend-based blog called BlogShopBuy to boost traffic amongst its sites.
Readers can follow frequent updates about commerce related happenings on the purchaser side of things. The blog provides editorials or pictorials on real time developments in a variety of markets, as well as articles on products [...]]]></description>
			<content:encoded><![CDATA[<p>Tapping into the internet shopping generation, MediaBandit has launched a trend-based blog called <a href="http://www.blogshopbuy.com">BlogShopBuy</a> to boost traffic amongst its sites.</p>
<p>Readers can follow frequent updates about commerce related happenings on the purchaser side of things. The blog provides editorials or pictorials on real time developments in a variety of markets, as well as articles on products that are having a revival- moving up through history and back onto our shelves.</p>
<p>The site, which has a clean look yet edgy feel, hopes to create a community where everyone is in the loop about what’s actually cool in an over-saturated consumer world. Additionally, the website&#8217;s features will facilitate user interaction, content sharing, and retailer links, so that buying capability is just a click away.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/278_may-2013-mediabandit-launches-consumer-trends-blog/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>April 2013: YouViewDevices.Com Launches</title>
		<link>http://www.mediabandit.co.uk/blog/268_april-2013-youviewdevices-com-launches</link>
		<comments>http://www.mediabandit.co.uk/blog/268_april-2013-youviewdevices-com-launches#comments</comments>
		<pubDate>Fri, 05 Apr 2013 11:41:31 +0000</pubDate>
		<dc:creator>mbcontent</dc:creator>
				<category><![CDATA[Media Bandit Projects]]></category>
		<category><![CDATA[Freeview Box]]></category>
		<category><![CDATA[Internet TV]]></category>
		<category><![CDATA[price comparison]]></category>
		<category><![CDATA[product launch]]></category>
		<category><![CDATA[YouView Device]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=268</guid>
		<description><![CDATA[A new milestone for MediaBandit, which has lift-off on its new site YouView Devices. ]]></description>
			<content:encoded><![CDATA[<p>Another milestone for MediaBandit, which has lift-off on its new site <a title="YouView Devices | Learn about YouView | Freeview and Internet TV" href="http://www.youviewdevices.com" target="_blank">YouView Devices</a>. Houston would be proud. Today’s web launch is aimed at the new age market looking to explore the realms of Internet TV.</p>
<p>Featuring all the regular benefits of already released boxes out there, the YouView device is a TV service that allows people to watch programmes on Freeview, just as BT and Talk Talk do.</p>
<p>So what’s so special? It also has the added benefit of allowing viewers to browse through thousands and thousands of online TV shows via their broadband connection, meaning no more hooking up your laptop to your TV through a dodgy HDMI cable.</p>
<p>YouView Devices introduces the device with a <a title="What channels are available on YouView Device" href="http://www.youviewdevices.com/channels" target="_blank">YouView Channel List</a>, YouView Device Features, and <a title="What is YouView about | The Pros and Cons | Why buy this device" href="http://www.youviewdevices.com/youview-pros-and-cons.html" target="_blank">YouView Pros and Cons</a>. The newly released site also presents a selection of stockists; with no subscription required, it allows visitors to compare prices and easily navigate their way to a suitable retailer to start experiencing the power of InternetTV.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/268_april-2013-youviewdevices-com-launches/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shopblaster Releases Today March 2013</title>
		<link>http://www.mediabandit.co.uk/blog/262_new-shopping-engine-released</link>
		<comments>http://www.mediabandit.co.uk/blog/262_new-shopping-engine-released#comments</comments>
		<pubDate>Thu, 14 Mar 2013 12:34:23 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[Media Bandit Projects]]></category>
		<category><![CDATA[price comparison]]></category>
		<category><![CDATA[product launch]]></category>
		<category><![CDATA[release]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=262</guid>
		<description><![CDATA[Today marks a new era for Media Bandit.  We have released a new version of our price comparison engine. The new site is called Shopblaster and is a UK site for people looking to buy and compare electronic goods from top retailers like John Lewis
What&#8217;s the URL
Why not Compare Prices on Electronics Today with [...]]]></description>
			<content:encoded><![CDATA[<p>Today marks a new era for Media Bandit.  We have released a new version of our price comparison engine. The new site is called Shopblaster and is a UK site for people looking to buy and compare electronic goods from top retailers like John Lewis</p>
<h2>What&#8217;s the URL</h2>
<p>Why not <a href='http://www.shopblaster.co.uk/?utm_source=mediabandit_blog' target='_blank'>Compare Prices on Electronics</a> Today with <a href='http://www.shopblaster.co.uk/?utm_source=mediabandit_blog' target='_blank'>Shopblaster.co.uk</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/262_new-shopping-engine-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ford bluetooth ios update</title>
		<link>http://www.mediabandit.co.uk/blog/256_foor-bluetooth-ios-update</link>
		<comments>http://www.mediabandit.co.uk/blog/256_foor-bluetooth-ios-update#comments</comments>
		<pubDate>Wed, 09 Nov 2011 17:50:50 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bluetooth]]></category>
		<category><![CDATA[ford]]></category>
		<category><![CDATA[ios5]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[update]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/256_foor-bluetooth-ios-update</guid>
		<description><![CDATA[I had a problem with bluetooth connection in my ford after updating my apple  iphone 4 to apple  iOS5 apparently it can be fixed with the update available here, http://www.fordownersclub.com/forums/topic/16107-bluetooth-phone-compatibility/
It takes 20 mins and has only just started so I will let you know how successful it was for me later!
its completed and It works [...]]]></description>
			<content:encoded><![CDATA[<p>I had a problem with bluetooth connection in my ford after updating my apple  iphone 4 to apple  iOS5 apparently it can be fixed with the update available here, http://www.fordownersclub.com/forums/topic/16107-bluetooth-phone-compatibility/</p>
<p>It takes 20 mins and has only just started so I will let you know how successful it was for me later!</p>
<p>its completed and It works !!!!!!!!!!!!!!!!!!!!!  Happy Days!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/256_foor-bluetooth-ios-update/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting up Cron with Zend Framework 1.11</title>
		<link>http://www.mediabandit.co.uk/blog/248_setting-up-cron-with-zend-framework-1-11</link>
		<comments>http://www.mediabandit.co.uk/blog/248_setting-up-cron-with-zend-framework-1-11#comments</comments>
		<pubDate>Wed, 19 Oct 2011 12:08:25 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[zend]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=248</guid>
		<description><![CDATA[Ever wanted to run Zend Framework from within your command line?  Here is a simple way to achieve just that using routers, single point of entries and controllers.]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re here you like me are struggling to figure out how we can setup cron jobs whilst using the good parts of the zend framework but whilst also removing the view rendering.  As we do not want any layout stuff to be processed.</p>
<p>To achieve this we need the following:</p>
<p>1. A new Entry point<br />
2. Setup default module<br />
3. A base Controller for all scripts<br />
4. A script Module with controllers</p>
<p>My setup is pretty standard, i have modules within my applications and all of my library files are within Library/App</p>
<p><span id="more-248"></span></p>
<h2>A new Entry Point</h2>
<p>Zend implements a single point of entry for all web applications, index.php.  This sets up the application calling the bootstrap.  Our script entry point needs to work the same as index except we need to only call the bootstrap functions that we need.</p>
<p>A good way of achieving a common entry point is to create a separate file within public/ called common_includes.php.  This will probably look something like:</p>
<pre>date_default_timezone_set("LOCALE");

// Define path to application directory
defined('APPLICATION_PATH')
    || define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));

define("SITE_ROOT", realpath(dirname(__FILE__)) . "/../");
// Define application environment
defined('APPLICATION_ENV')
    || define('APPLICATION_ENV', (getenv('APPLICATION_ENV') ? getenv('APPLICATION_ENV') : 'production'));

// Ensure library/ is on include_path
// Make sure we have Models too
set_include_path(implode(PATH_SEPARATOR, array(
	realpath(APPLICATION_PATH),
    realpath(APPLICATION_PATH . '/../library'),
    realpath(APPLICATION_PATH . '/../library/Frameworks'),
    realpath(APPLICATION_PATH . '/models'),
    get_include_path(),
)));

/** Zend_Application */
require_once 'Zend/Application.php';

// Create application, bootstrap, and run
$application = new Zend_Application(
    APPLICATION_ENV,
    APPLICATION_PATH . '/configs/application.ini'
);</pre>
<p>then within your index.php do the following</p>
<pre><!--?php include_once("common_include.php"); $application--->bootstrap()
   	       -&gt;run();</pre>
<p>Your script.php ( entry for scripts ) will be slightly different, it might look something like:</p>
<pre><!--?php include_once("common_include.php"); $application--->bootstrap(array("Constants",........))
   	       -&gt;run();</pre>
<p>where each entry of the array corresponds to an _init function</p>
<h2>Setup a default module</h2>
<p>We need to force all controllers and actions to look within a particular module.  I do this within a bootstrap function called _initCli() which looks like:</p>
<pre>function _initCli() {
	$front 	= Zend_Controller_Front::getInstance();
	if (PHP_SAPI == 'cli') {
	    $front-&gt;registerPlugin(
	        new Zend_Controller_Plugin_ErrorHandler(
	            array(
		      'module'     =&gt; 'script',
		      'controller' =&gt; 'error',
		      'action'     =&gt; 'index'
		    )
		)
	    );
	    $front-&gt;setDefaultModule("script");
	} else {
	    $front-&gt;setDefaultModule("default");
	}
}</pre>
<p>This will be called by both the index and script.php but when it&#8217;s a script it sets the module to be script and sets up the error controller.  ( I&#8217;m sure there is a better way of doing this &#8211; separate bootstrap files? )</p>
<h2>Setup new Router</h2>
<p>Our solution for doing scripts is to allow us to do something like this:<br />
<code><br />
script.php controller=mail action=send<br />
</code><br />
this needs to be mapped to call the mail controller and send action.  Within that action we do what we need to do &#8211; in this example send any new mail.</p>
<p>Here is the router i use and i&#8217;ll try and explain how it works</p>
<pre>class App_Router_Cli extends Zend_Controller_Router_Abstract {
      public function route (Zend_Controller_Request_Abstract $dispatcher) {

		$getopt 	= new Zend_Console_Getopt (array ());
		$arguments 	= $getopt-&gt;getRemainingArgs();

		$controller = "";
		$action 	= "";
		$params		= array();

        if ($arguments) {

        	foreach($arguments as $index =&gt; $command) {

        		$details = explode("=", $command);

        		if($details[0] == "controller") {
        			$controller = $details[1];
        		} else if($details[0] == "action") {
        			$action = $details[1];
        		} else {
        			$params[$details[0]] = $details[1];
        		}
        	}

        	if($action == "" || $controller == "") {
        		die("
        			Missing Controller and Action Arguments
        			==
        			You should have:
        					php script.php controller=[controllername] action=[action]
        		");
        	}

			$dispatcher-&gt;setControllerName($controller);
			$dispatcher-&gt;setActionName($action);
			$dispatcher-&gt;setParams($params);

			return $dispatcher;
		}
		echo "Invalid command.\n", exit;
        echo "No command given.\n", exit;
    }

    public function assemble ($userParams, $name = null, $reset = false, $encode = true) {
        throw new Exception("Assemble isnt implemented ", print_r($userParams, true));
    }
}</pre>
<p>Note: I found this on someone elses blog and have tailored it to my needs.  As you can see it will look for controller and action ARGV and sets the dispatcher accordingly.  It will then pass everything else to the params so within the controller we&#8217;re able to access them as we need.</p>
<p>This router is called from within my bootstrap:</p>
<pre>	protected function _initRouter () {
	    if (PHP_SAPI == 'cli') {
	        $this-&gt;bootstrap ('frontcontroller');
	        $front = $this-&gt;getResource('frontcontroller');
	        $front-&gt;setRouter (new App_Router_Cli());
	        $front-&gt;setRequest (new Zend_Controller_Request_Simple ());
	    }
	}</pre>
<p>notice i only run this if we&#8217;re running as a script</p>
<p>So at this stage when we run:</p>
<p><code><br />
php -f script.php controller=mail action=send<br />
</code></p>
<p>zend should try to load MailController-&gt;sendAction() within the module.</p>
<h2>Base Controller</h2>
<p>We need to update the MailController to extend a basecontroller that turns of view rendering.</p>
<pre>class App_Controller_Script_Abstract extends Zend_Controller_Action {
	function init() {
		parent::init();
		$this-&gt;_helper-&gt;viewRenderer-&gt;setNoRender(TRUE);
	}
}</pre>
<p>This will turn off the dispatcher meaning we do not need any view scripts.</p>
<h2>Hey Presto</h2>
<p>So after all of those changes you should now be able to run your controllers and actions from within your command line or session.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/248_setting-up-cron-with-zend-framework-1-11/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AWK by example</title>
		<link>http://www.mediabandit.co.uk/blog/246_awk-by-example-2</link>
		<comments>http://www.mediabandit.co.uk/blog/246_awk-by-example-2#comments</comments>
		<pubDate>Mon, 18 Jul 2011 16:46:58 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=246</guid>
		<description><![CDATA[I have 100% ripped this as a rsource for me it was written originally by Javier Palacios Bermejo
About the author:
Javier is involved in a Ph. D. in Astronomy at a Spanish university where he administrates a workstation cluster. The daily work in his department is done on Unix machines. After some initial problems and trials [...]]]></description>
			<content:encoded><![CDATA[<p>I have 100% ripped this as a rsource for me it was written originally by Javier Palacios Bermejo</p>
<p>About the author:</p>
<p>Javier is involved in a Ph. D. in Astronomy at a Spanish university where he administrates a workstation cluster. The daily work in his department is done on Unix machines. After some initial problems and trials slackware Linux was chosen. Linux turned out to be much better than some other proprietary Unix systems.</p>
<p>Content:</p>
<p>    Introduction to awk<br />
    A problem<br />
    &#8230; and a solution<br />
    Deeper AWK<br />
    Working over matched lines<br />
    Awk as a programming language<br />
    Including libraries<br />
    Conclussions<br />
    Additional information</p>
<p>Examples with awk: A short introduction<br />
[Ilustration]</p>
<p>Abstract:</p>
<p>This article gives some insight in to the tricks that you can do with AWK. It is not a tutorial but it provides real live examples to use.</p>
<p>Originally, the idea to write this text came to me after reading a couple of articles published in LinuxFocus that were written by Guido Socher. One of them, about find and related commands, showed me that I was not the only one who used the command line. Pretty GUIs don&#8217;t tell you how the things are really done (that&#8217;s the way that Windows went years ago). The other article was about regular expressions. Although regular expressions are only slightly touched in this article, you need to know them to get the maximum from awk and other commands like sed and grep.</p>
<p>The key question is whether this awk command is really useful. The answer is definitly yes! It could be useful for a normal user to process text files, re-format them etc&#8230; For a system administrator AWK is really a very important utility. Just walk around /var/yp/Makefile or look at the initialization scripts . AWK is used everywhere.<br />
Introduction to awk</p>
<p>My first news about AWK are old enough for being forgotten. I had a colleague who needed to work with some really big outputs from a small Cray. The manual page for awk on the Cray was small, but he said that AWK looks very much like the thing he needs although he did not yet understand how to use it.<br />
A long time later, we are back in my life again. A colleague of mine used AWK to extract the first column from a file with the command:</p>
<p>awk &#8216;  &#8216;{print $1}&#8217;   file</p>
<p>Easy, isn&#8217;t it? This simple task does not need complex programming in C. One line of AWK does it.</p>
<p>Once we have learned the lesson on how to extract a column we can do things such as renaming files (append .new to &#8220;files_list&#8221;):</p>
<p>ls files_list | awk &#8216;{print &#8220;mv &#8220;$1&#8243; &#8220;$1&#8243;.new&#8221;}&#8217; | sh</p>
<p>&#8230; and more:</p>
<p>    Renaming within the name:<br />
    ls -1 *old* | awk &#8216;{print &#8220;mv &#8220;$1&#8243; &#8220;$1}&#8217; | sed s/old/new/2 | sh<br />
    (although in some cases it will fail, as in file_old_and_old)</p>
<p>    remove only files:<br />
    ls -l * | grep -v drwx | awk &#8216;{print &#8220;rm &#8220;$9}&#8217; | sh<br />
    or with awk alone:<br />
    ls -l|awk &#8216;$1!~/^drwx/{print $9}&#8217;|xargs rm<br />
    Be careful when trying this out in your home directory. We remove files!</p>
<p>    remove only directories<br />
    ls -l | grep &#8216;^d&#8217; | awk &#8216;{print &#8220;rm -r &#8220;$9}&#8217; | sh<br />
    or<br />
    ls -p | grep /$ | wk &#8216;{print &#8220;rm -r &#8220;$1}&#8217;<br />
    or with awk alone:<br />
    ls -l|awk &#8216;$1~/^d.*x/{print $9}&#8217;|xargs rm -r<br />
    Be careful when trying this out in your home directory. We remove things!</p>
<p>    killing processes by name (in this example we kill the process called netscape):<br />
    kill `ps auxww | grep netscape | egrep -v grep | awk &#8216;{print $2}&#8217;`<br />
    or with awk alone:<br />
    ps auxww | awk &#8216;$0~/netscape/&#038;&#038;$0!~/awk/{print $2}&#8217; |xargs kill<br />
    It has to be adjusted to fit the ps command on whatever unix system you are on. Basically it is: &#8220;If the process is called netscape and it is not called &#8216;grep netscape&#8217; (or awk) then print the pid&#8221;</p>
<p>As you can see, AWK really helps when the same calculations are repeated over and over &#8230; and apart from that it is much more fun to write an AWK program than doing almost the same thing 20 times manually.</p>
<p>awk is a little programming language, with a syntax close to C in many aspects. It is an interpreted language and the awk interpreter processes the instructions.</p>
<p>About the syntax of the awk command interpreter itself:</p>
<p># gawk &#8211;help<br />
Usage: gawk [POSIX or GNU style options] -f progfile [--] file &#8230;<br />
        gawk [POSIX or GNU style options] [--] &#8216;program&#8217; file &#8230;<br />
POSIX options:          GNU long options:<br />
        -f progfile             &#8211;file=progfile<br />
        -F fs                   &#8211;field-separator=fs<br />
        -v var=val              &#8211;assign=var=val<br />
        -m[fr] val<br />
        -W compat               &#8211;compat<br />
        -W copyleft             &#8211;copyleft<br />
        -W copyright            &#8211;copyright<br />
        -W help                 &#8211;help<br />
        -W lint                 &#8211;lint<br />
        -W lint-old             &#8211;lint-old<br />
        -W posix                &#8211;posix<br />
        -W re-interval          &#8211;re-interval<br />
        -W source=program-text  &#8211;source=program-text<br />
        -W traditional          &#8211;traditional<br />
        -W usage                &#8211;usage<br />
        -W version              &#8211;version</p>
<p>Instead of simply quoting (&#8216;) the programs in the command line, we can, as you can see above, write the instructions into a file, and call it with the option -f. With command line defined variables using -v var=val we can add some flexibility to the programs.</p>
<p>Awk is, roughly speaking, a language oriented to manage tables. That is some information which can be grouped inside fields and records. The advantage here is that the record definition (and the field definition) is flexible.</p>
<p>Awk is powerful. It&#8217;s designed for work with one-line records, but that point could be relaxed. In order to see in some of these aspects, we are going to look at some illustrative (and real) examples.</p>
<p>    Printing tables in a slightly prettier way<br />
    Maybe, you have had to print some ASCII table obtained from somewhere. For example the hostnames, ethernet and IP numbers in a list. When those tables are really big, reading becames difficult, and we feel that we need this list printed with LaTeX or, at least, with a better format. If the table is simple then it&#8217;s not too dificult:</p>
<p>    BEGIN {<br />
      printf &#8220;LaTeX preample&#8221;<br />
      printf &#8220;\\begin{tabular}{|c|c|&#8230;|c|}&#8221;<br />
      }</p>
<p>    { printf $1&#8243; &#038; &#8221;<br />
      printf $2&#8243; &#038; &#8221;<br />
      .<br />
      .<br />
      .<br />
      printf $n&#8221; \\\\ &#8221;<br />
      printf &#8220;\\hline&#8221;<br />
      }</p>
<p>    END {<br />
      print &#8220;\\end{document}&#8221;<br />
      }</p>
<p>    Certainly, this is not a generic program, but we&#8217;re just starting &#8230;<br />
    (The double backslashes (\) are necessary because it&#8217;s the shell escape character)</p>
<p>    Slicing output files<br />
    SIMBAD is an astronomical objects database that, among other things, provides a stars positions on the sky plane. Once in the past I needed to perform searches to draw charts around some objects. The interface allowed to save the results in text files, and I had two approaches: 1) create one file for each object, or 2) feed it with the whole input list, getting a single big output log file with the query results. As I decided to go for the second approach, I used awk for slicing the big output log. Obviously, I needed to take advantage on some output characteristics.<br />
        Each request produces a header line with a format like<br />
        ====> name : nlines <====<br />
        The first header allow us to know when a new object begans, and the fourth how many entries the object contains.<br />
        The character used in the output lists to mark different columns was '|'. This requires two additional code lines to filter to the output and get only the fields I was interessted in.</p>
<p>    ( $1 == "====>&#8221; ) {<br />
      NomObj = $2<br />
      TotObj = $4<br />
      if ( TotObj > 0 ) {<br />
        FS = &#8220;|&#8221;<br />
        for ( cont=0 ; cont<TotObj ; cont++ ) {<br />
            getline<br />
            print $2 $4 $5 $3  >> NomObj<br />
            }<br />
        FS = &#8221; &#8221;<br />
        }<br />
      }</p>
<p>    Acutally, the object name was not returned, and it was sligthly more complicated, but this is supposed to be an illustrative example.</p>
<p>    Playing with the mail spool</p>
<p>    BEGIN {<br />
      BEGIN_MSG  = &#8220;From&#8221;<br />
      BEGIN_BDY  = &#8220;Precedence:&#8221;<br />
      MAIN_KEY   = &#8220;Subject:&#8221;<br />
      VALIDATION = &#8220;[MONTH REPORT]&#8221;</p>
<p>      HEAD = &#8220;NO&#8221;; BODY = &#8220;NO&#8221;; PRINT=&#8221;NO&#8221;<br />
      OUT_FILE = &#8220;Month_Reports&#8221;<br />
      }</p>
<p>      {</p>
<p>      if ( $1 == BEGIN_MSG ) {<br />
        HEAD = &#8220;YES&#8221;; BODY = &#8220;NO&#8221;; PRINT=&#8221;NO&#8221;<br />
        }</p>
<p>      if ( $1 == MAIN_KEY ) {<br />
        if ( $2 == VALIDATION ) {<br />
          PRINT = &#8220;YES&#8221;<br />
          $1 = &#8220;&#8221;; $2 = &#8220;&#8221;<br />
          print &#8220;\n\n&#8221;$0&#8243;\n&#8221; > OUT_FILE<br />
          }<br />
        }</p>
<p>      if ( $1 == BEGIN_BDY ) {<br />
        getline<br />
        if ( $0 == &#8220;&#8221; ) {<br />
          HEAD = &#8220;NO&#8221;; BODY = &#8220;YES&#8221;<br />
        } else {<br />
          HEAD = &#8220;NO&#8221;; BODY = &#8220;NO&#8221;; PRINT=&#8221;NO&#8221;<br />
          }<br />
        }</p>
<p>      if ( BODY == &#8220;YES&#8221; &#038;&#038; PRINT == &#8220;YES&#8221; ) {<br />
        print $0 >> OUT_FILE<br />
        }<br />
      }</p>
<p>    	Maybe we are administrating a mailing list and from time to time, some special messages are submitted to the list (for example, monthly reports) with some specific format (subject as &#8216;[MONTH REPORT] month , dept&#8217;). Suddenly, we decide at the end of the year put together all these messages, saving aside the others.<br />
    This can be done by processing the mail spool with the awk program on the left.</p>
<p>    To get each report written to an individual file means three extra lines of code.<br />
    NOTE: This example assumes that the mail spool is structured as I think it is. This programs works for my mail.</p>
<p>I&#8217;ve used awk for many other tasks (automatic generation of web pages with information from simple databases) and I know enough about awk programming to be sure that a lot of things can be done.<br />
Just let your imagination fly.</p>
<p>A problem<br />
One problem is that awk needs perfect tabular information, no holes, awk does e.g not work with fixed width columns. This is not problematic if we create by ourself the awk input: choose something uncommon to separate the fields, later we fix it with FS and we are done!!! If we already have the input this could be a little more problematic. For example a table like this:</p>
<p>1234  HD 13324  22:40:54 &#8230;.<br />
1235  HD122235  22:43:12 &#8230;.</p>
<p>This is difficult to handle this with awk. Unfortunately this is quite common. If we have only one column with this characteristics, we can solve the problem (if anybody knows how to manage more than one column in a generic case, please let me know!).<br />
I had to face one of these tables, similar to the one described above. The second column was a name and it included a variable number of spaces. As it usually happens, I had to sort it using the last column.</p>
<p>&#8230; and a solution<br />
I realized that the column I wanted to sort was the last one and awk knows how many fields there are in the current registry. Therefore, it was enough to access the last one (sometimes $4, and sometimes $5, but always NF). At the end of the day, the desired result was obtained:</p>
<p>awk &#8216;{ printf $NF;$NF = &#8220;&#8221; ;printf &#8221; &#8220;$0&#8243;\n&#8221; }&#8217; | sort</p>
<p>This just shifts the last colum to the first position and you can sort it. Obviously, this method is easily applied to the third field starting from the end, or to the field which goes after a control field which has always the same value.<br />
Just use your ideas and imagination.</p>
<p>Deeper AWK</p>
<p>Working over matched lines</p>
<p>Up to now, nearly all the examples process all the input file lines. But, as also the manual page states, it is possible to process only some of the input lines. One must just preceed the group of commands with the condition the line should meet. The matching condition could be very flexible, variing from a simple regular expression to a check on the contents of some field, with the possibility of grouping conditions with the proper logical operators.</p>
<p>Awk as a programming language</p>
<p>As any other programming language, awk implements all the necessary flow control structures, as well as a set of operators and predefined functions to deal with numbers and strings.</p>
<p>It&#8217;s possible, of course, to include user defined functions with the keyword function. Apart from the common scalar variables, awk is also able to manage variable sized arrays.</p>
<p>Including libraries</p>
<p>As it happens in any programming language, there are some very common functions and it becomes uncomfortable to cut and paste pieces of code. That&#8217;s the reason why libraries exist. With the GNU version of awk, is possible include them within the awk program. This is however an outlook to the things which are possible and outside the scope of this article.</p>
<p>Conclussions<br />
Certainly, awk might not be as poweful as many other tools designed with similar goals. But it has the big advantage that it is possible write in a really short time small programs which are fully tailored to our needs.</p>
<p>AWK is very appropriate for the purposes for which it was build: Read data line by line and act upon the strings and patterns in the lines.</p>
<p>Files like /etc/password turn out to be ideal for reformatting and processing with AWK. AWK is invaluable for such tasks.</p>
<p>Of course AWK is not alone. Perl is a strong competitor but still it is worthwhile to know some AWK tricks.</p>
<p>Additional information</p>
<p>This kind of very basic commands and is not very well documented, but you can find something when looking around.</p>
<p>    awk syntax is not the same in every Unix system, but there is a way to learning how is it in our particular system:<br />
    man awk<br />
    O&#8217;Reilly has published a book: Sed &#038; Awk (Nutshell handbook) by Dale Dougherty.<br />
    Looking at Amazon, we find more titles such as Effective Awk Programming: A User&#8217;s Guide, oriented to gawk, and half a dozen titles more. </p>
<p>Usually, all books on unix mention this command, but only some of them treat it in detail. The best we can do, is to browse any book we get into our hands. You never know where useful information can be found.<br />
Webpages maintained by the LinuxFocus Editor team<br />
© Javier Palacios Bermejo<br />
LinuxFocus 1999 	Translation information:<br />
es 	-> 	&#8211; 	Javier Palacios Bermejo<br />
es 	-> 	en 	Javier Palacios Bermejo,Ruben Sotillo, Manuel Rodriguez</p>
<p>1999-10-14, generated by lfparser version 0.7</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/246_awk-by-example-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Changing default Eclipse Workspace</title>
		<link>http://www.mediabandit.co.uk/blog/245_changing-default-eclipse-workspace</link>
		<comments>http://www.mediabandit.co.uk/blog/245_changing-default-eclipse-workspace#comments</comments>
		<pubDate>Fri, 08 Apr 2011 14:59:31 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/245_changing-default-eclipse-workspace</guid>
		<description><![CDATA[1) Launch Eclipse
2) Minimize Eclipse
3) Launch a new instance of Eclipse
Eclipse will now complains that the workspace is already being used and ask you to choose a new one by bringn up a workspace dialog. 
Happy Days, job done.
]]></description>
			<content:encoded><![CDATA[<p>1) Launch Eclipse<br />
2) Minimize Eclipse<br />
3) Launch a new instance of Eclipse</p>
<p>Eclipse will now complains that the workspace is already being used and ask you to choose a new one by bringn up a workspace dialog. </p>
<p>Happy Days, job done.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/245_changing-default-eclipse-workspace/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NotePad++  Load langs.xml failed Fix</title>
		<link>http://www.mediabandit.co.uk/blog/241_notepad-load-langs-xml-failed-fix</link>
		<comments>http://www.mediabandit.co.uk/blog/241_notepad-load-langs-xml-failed-fix#comments</comments>
		<pubDate>Wed, 23 Mar 2011 16:10:13 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=241</guid>
		<description><![CDATA[Whenever I opened NotePad++ I used to get the following error message appear: Load langs.xml failed!
Langs.xml fix
It is really easy, go to the Notepad++ installation folder delete or rename the langs.xml to something else. In that same folder there is a file called langs.model.xml, make a copy of it and rename that copy to langs.xml.
More [...]]]></description>
			<content:encoded><![CDATA[<p>Whenever I opened NotePad++ I used to get the following error message appear: Load langs.xml failed!</p>
<p><strong>Langs.xml fix</strong><br />
It is really easy, go to the Notepad++ installation folder delete or rename the langs.xml to something else. In that same folder there is a file called langs.model.xml, make a copy of it and rename that copy to langs.xml.</p>
<p><strong>More info if you care</strong><br />
Somehow your langs.xml is corrupt, if I were a betting man I would guarentee that file is in the installation folder. </p>
<p><strong>summary</strong><br />
This solution fixed the &#8216;Load langs.xml failed&#8217; problem for me and so hopefully it will work for you too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/241_notepad-load-langs-xml-failed-fix/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting Mobile Browsers in Zend 1.11</title>
		<link>http://www.mediabandit.co.uk/blog/227_detecting-mobile-browsers-in-zend-1-11</link>
		<comments>http://www.mediabandit.co.uk/blog/227_detecting-mobile-browsers-in-zend-1-11#comments</comments>
		<pubDate>Sat, 19 Mar 2011 14:39:42 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[zend]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=227</guid>
		<description><![CDATA[Tutorial on using WURFL with zend, how we can detect a mobile device in a zend application and how to change a viewscript suffix from within a plugin.]]></description>
			<content:encoded><![CDATA[<p>This blog post will explain, with a semblence of detail, how to detect if your <em>zend application</em> is being viewed on a <em>mobile device</em>.  There are a number of tutorials out there that describe multiple ways of getting the mobile brower and making it available within your application, I tried just about all of them but the only way that worked, for my needs, was using <a href='http://framework.zend.com/manual/en/zend.http.user-agent.html' targer='_blank'>Zend_Http_Useragent</a> and <a href=''>WURFL 1.1</a>.</p>
<p>This post will explain the following:</p>
<p>1. How to setup your application to use WURFL<br />
2. How to configure Zend with WURFL<br />
3. How to actually use WURFL with Zend</p>
<p><strong>Health Warning</strong>: I&#8217;m not the most experienced at Zend development, in fact, i&#8217;ve only recently started using it, so what i&#8217;m explaining might not be the most solution, but it should at least work, i&#8217;m trying to write this so you can simply copy and paste and everything else will be ok.</p>
<p>Now that&#8217;s out of the way, prepare to be amazed.  Or if you&#8217;re like me, frustrated.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/227_detecting-mobile-browsers-in-zend-1-11/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Awk by example</title>
		<link>http://www.mediabandit.co.uk/blog/222_awk-by-example</link>
		<comments>http://www.mediabandit.co.uk/blog/222_awk-by-example#comments</comments>
		<pubDate>Sat, 12 Mar 2011 15:03:55 +0000</pubDate>
		<dc:creator>MediaBandit Ltd</dc:creator>
				<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://www.mediabandit.co.uk/blog/?p=222</guid>
		<description><![CDATA[Here is an awesome article on AWK, which dispite the aweful name is pretty cool and could be a very powerful weapon in your tool arsenal when you know how.  I can say that its awesome with out feeling like a knobber because the original awk tutorial was written by someone that really knows [...]]]></description>
			<content:encoded><![CDATA[<p>Here is an awesome article on AWK, which dispite the aweful name is pretty cool and could be a very powerful weapon in your tool arsenal when you know how.  I can say that its awesome with out feeling like a knobber because the original awk tutorial was written by someone that really knows there stuff a Mr Daniel Robbins who is only the President/CEO of Gentoo.  I&#8217;ve put a copy of the AWK examples here for my reference and anyone else that is keen to learn more about AWK. I might have changed bits here and there to make it more applicaable to our use but for the most it is pretty true to the original.<br />
<span id="more-222"></span><br />
So Lets go ahead and write our first awk to see how it works. At the command line, enter the following command:</p>
<p><code>$ awk '{ print }' /etc/passwd </code></p>
<p>You should see the contents of your /etc/passwd file appear before your eyes. Now, for an explanation of what awk did. When we called awk, we specified /etc/passwd as our input file. When we executed awk, it evaluated the print command for each line in /etc/passwd, in order. All output is sent to stdout, and we get a result identical to catting /etc/passwd. Now, for an explanation of the { print } code block. In awk, curly braces are used to group blocks of code together, similar to C. Inside our block of code, we have a single print command. In awk, when a print command appears by itself, the full contents of the current line are printed.</p>
<p>Here is another awk example that does exactly the same thing:</p>
<p><code>$ awk '{ print $0 }' /etc/passwd </code></p>
<p>In awk, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing. If you&#8217;d like, you can create an awk program that will output data totally unrelated to the input data. Here&#8217;s an example:</p>
<p><code>$ awk '{ print "" }' /etc/passwd </code></p>
<p>Whenever you pass the &#8220;&#8221; string to the print command, it prints a blank line. If you test this script, you&#8217;ll find that awk outputs one blank line for every line in your /etc/passwd file. Again, this is because awk executes your script for every line in the input file. Here&#8217;s another example:</p>
<p><code>$ awk '{ print "hiya" }' /etc/passwd </code></p>
<p>Running this script will fill your screen with hiya&#8217;s. <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Multiple fields</p>
<p>Awk is really good at handling text that has been broken into multiple logical fields, and allows you to effortlessly reference each individual field from inside your awk script. The following script will print out a list of all user accounts on your system:</p>
<p><code>$ awk -F":" '{ print $1 }' /etc/passwd </code></p>
<p>Above, when we called awk, we use the -F option to specify &#8220;:&#8221; as the field separator. When awk processes the print $1 command, it will print out the first field that appears on each line in the input file. Here&#8217;s another example:</p>
<p><code>$ awk -F":" '{ print $1 $3 }' /etc/passwd </code></p>
<p>Here&#8217;s an excerpt of the output from this script:</p>
<p><code>halt7<br />
operator11<br />
root0<br />
shutdown6<br />
sync5<br />
bin1<br />
....etc. </code></p>
<p>As you can see, awk prints out the first and third fields of the /etc/passwd file, which happen to be the username and uid fields respectively. Now, while the script did work, it&#8217;s not perfect &#8212; there aren&#8217;t any spaces between the two output fields! If you&#8217;re used to programming in bash or python, you may have expected the print $1 $3 command to insert a space between the two fields. However, when two strings appear next to each other in an awk program, awk concatenates them without adding an intermediate space. The following command will insert a space between both fields:</p>
<p><code>$ awk -F":" '{ print $1 " " $3 }' /etc/passwd </code></p>
<p>When you call print this way, it&#8217;ll concatenate $1, &#8221; &#8220;, and $3, creating readable output. Of course, we can also insert some text labels if needed:</p>
<p><code>$ awk -F":" '{ print "username: " $1 "\t\tuid:" $3" }' /etc/passwd </code></p>
<p>This will cause the output to be:</p>
<p><code>username: halt     uid:7<br />
username: operator uid:11<br />
username: root     uid:0<br />
username: shutdown uid:6<br />
username: sync     uid:5<br />
username: bin      uid:1<br />
....etc. </code></p>
<p>External scripts</p>
<p>Passing your scripts to awk as a command line argument can be very handy for small one-liners, but when it comes to complex, multi-line programs, you&#8217;ll definitely want to compose your script in an external file. Awk can then be told to source this script file by passing it the -f option:</p>
<p><code>$ awk -f myscript.awk myfile.in </code></p>
<p>Putting your scripts in their own text files also allows you to take advantage of additional awk features. For example, this multi-line script does the same thing as one of our earlier one-liners, printing out the first field of each line in /etc/passwd:</p>
<p><code>BEGIN {<br />
        FS=":"<br />
}<br />
{ print $1 }<br />
</code></p>
<p>The difference between these two methods has to do with how we set the field separator. In this script, the field separator is specified within the code itself (by setting the FS variable), while our previous example set FS by passing the -F&#8221;:&#8221; option to awk on the command line. It&#8217;s generally best to set the field separator inside the script itself, simply because it means you have one less command line argument to remember to type. We&#8217;ll cover the FS variable in more detail later in this article.</p>
<p>The BEGIN and END blocks</p>
<p>Normally, awk executes each block of your script&#8217;s code once for each input line. However, there are many programming situations where you may need to execute initialization code before awk begins processing the text from the input file. For such situations, awk allows you to define a BEGIN block. We used a BEGIN block in the previous example. Because the BEGIN block is evaluated before awk starts processing the input file, it&#8217;s an excellent place to initialize the FS (field separator) variable, print a heading, or initialize other global variables that you&#8217;ll reference later in the program.</p>
<p>Awk also provides another special block, called the END block. Awk executes this block after all lines in the input file have been processed. Typically, the END block is used to perform final calculations or print summaries that should appear at the end of the output stream.</p>
<p>Regular expressions and blocks</p>
<p>Awk allows the use of regular expressions to selectively execute an individual block of code, depending on whether or not the regular expression matches the current line. Here&#8217;s an example script that outputs only those lines that contain the character sequence foo:</p>
<p><code>/foo/ { print } </code></p>
<p>Of course, you can use more complicated regular expressions. Here&#8217;s a script that will print only lines that contain a floating point number:</p>
<p><code>/[0-9]+\.[0-9]*/ { print }</code> </p>
<p><strong>Expressions and blocks</strong></p>
<p>There are many other ways to selectively execute a block of code. We can place any kind of boolean expression before a code block to control when a particular block is executed. Awk will execute a code block only if the preceding boolean expression evaluates to true. The following example script will output the third field of all lines that have a first field equal to fred. If the first field of the current line is not equal to fred, awk will continue processing the file and will not execute the print statement for the current line:</p>
<p><code>$1 == "fred" { print $3 }</code></p>
<p>Awk offers a full selection of comparison operators, including the usual &#8220;==&#8221;, &#8220;<", ">&#8220;, &#8220;<=", ">=&#8221;, and &#8220;!=&#8221;. In addition, awk provides the &#8220;~&#8221; and &#8220;!~&#8221; operators, which mean &#8220;matches&#8221; and &#8220;does not match&#8221;. They&#8217;re used by specifying a variable on the left side of the operator, and a regular expression on the right side. Here&#8217;s an example that will print only the third field on the line if the fifth field on the same line contains the character sequence root:</p>
<p><code>$5 ~ /root/ { print $3 } </code></p>
<p><strong>Conditional statements</strong></p>
<p>Awk also offers very nice C-like if statements. If you&#8217;d like, you could rewrite the previous script using an if statement:</p>
<p><code>{<br />
  if ( $5 ~ /root/ ) {<br />
          print $3<br />
  }<br />
}<br />
</code></p>
<p>Both scripts function identically. In the first example, the boolean expression is placed outside the block, while in the second example, the block is executed for every input line, and we selectively perform the print command by using an if statement. Both methods are available, and you can choose the one that best meshes with the other parts of your script.</p>
<p>Here&#8217;s a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:</p>
<p><code>{<br />
  if ( $1 == "foo" ) {<br />
           if ( $2 == "foo" ) {<br />
                    print "uno"<br />
           } else {<br />
                    print "one"<br />
           }<br />
  } else if ($1 == "bar" ) {<br />
           print "two"<br />
  } else {<br />
           print "three"<br />
  }<br />
} </p>
<p>Using if statements, we can also transform this code:</p>
<p>! /matchme/ { print $1 $3 $4 }</p>
<p>to this:</p>
<p>{<br />
  if ( $0 !~ /matchme/ ) {<br />
          print $1 $3 $4<br />
  }<br />
}<br />
</code></p>
<p>Both scripts will output only those lines that don&#8217;t contain a matchme character sequence. Again, you can choose the method that works best for your code. They both do the same thing.</p>
<p>Awk also allows the use of boolean operators &#8220;||&#8221; (for &#8220;logical or&#8221;) and &#8220;&#038;&#038;&#8221;(for &#8220;logical and&#8221;) to allow the creation of more complex boolean expressions:</p>
<p><code>( $1 == "foo" ) &#038;&#038; ( $2 == "bar" ) { print } </code></p>
<p>This example will print only those lines where field one equals foo and field two equals bar.</p>
<p><strong>Numeric variables!</strong></p>
<p>So far, we&#8217;ve either printed strings, the entire line, or specific fields. However, awk also allows us to perform both integer and floating point math. Using mathematical expressions, it&#8217;s very easy to write a script that counts the number of blank lines in a file. Here&#8217;s one that does just that:</p>
<p><code>BEGIN { x=0 }<br />
/^$/  { x=x+1 }<br />
END   { print "I found " x " blank lines. <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> " } </code></p>
<p>In the BEGIN block, we initialize our integer variable x to zero. Then, each time awk encounters a blank line, awk will execute the x=x+1 statement, incrementing x. After all the lines have been processed, the END block will execute, and awk will print out a final summary, specifying the number of blank lines it found.</p>
<p><strong>Stringy variables</strong></p>
<p>One of the neat things about awk variables is that they are &#8220;simple and stringy.&#8221; I consider awk variables &#8220;stringy&#8221; because all awk variables are stored internally as strings. At the same time, awk variables are &#8220;simple&#8221; because you can perform mathematical operations on a variable, and as long as it contains a valid numeric string, awk automatically takes care of the string-to-number conversion steps. To see what I mean, check out this example:</p>
<p><code>x="1.01"<br />
# We just set x to contain the *string* "1.01"<br />
x=x+1<br />
# We just added one to a *string*<br />
print x<br />
# Incidentally, these are comments <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</code></p>
<p>Awk will output:</p>
<p><code>2.01</code></p>
<p>Interesting! Although we assigned the string value 1.01 to the variable x, we were still able to add one to it. We wouldn&#8217;t be able to do this in bash or python. First of all, bash doesn&#8217;t support floating point arithmetic. And, while bash has &#8220;stringy&#8221; variables, they aren&#8217;t &#8220;simple&#8221;; to perform any mathematical operations, bash requires that we enclose our math in an ugly $( ) construct. If we were using python, we would have to explicitly convert our 1.01 string to a floating point value before performing any arithmetic on it. While this isn&#8217;t difficult, it&#8217;s still an additional step. With awk, it&#8217;s all automatic, and that makes our code nice and clean. If we wanted to square and add one to the first field in each input line, we would use this script:</p>
<p><code>{ print ($1^2)+1 } </code></p>
<p>If you do a little experimenting, you&#8217;ll find that if a particular variable doesn&#8217;t contain a valid number, awk will treat that variable as a numerical zero when it evaluates your mathematical expression.</p>
<p><strong>Lots of operators</strong></p>
<p>Another nice thing about awk is its full complement of mathematical operators. In addition to standard addition, subtraction, multiplication, and division, awk allows us to use the previously demonstrated exponent operator &#8220;^&#8221;, the modulo (remainder) operator &#8220;%&#8221;, and a bunch of other handy assignment operators borrowed from C.</p>
<p>These include pre- and post-increment/decrement ( i++, &#8211;foo ), add/sub/mult/div assign operators ( a+=3, b*=2, c/=2.2, d-=6.2 ). But that&#8217;s not all &#8212; we also get handy modulo/exponent assign ops as well ( a^=2, b%=4 ).</p>
<p><strong>Field separators</strong></p>
<p>Awk has its own complement of special variables. Some of them allow you to fine-tune how awk functions, while others can be read to glean valuable information about the input. We&#8217;ve already touched on one of these special variables, FS. As mentioned earlier, this variable allows you to set the character sequence that awk expects to find between fields. When we were using /etc/passwd as input, FS was set to &#8220;:&#8221;. While this did the trick, FS allows us even more flexibility.</p>
<p>The FS value is not limited to a single character; it can also be set to a regular expression, specifying a character pattern of any length. If you&#8217;re processing fields separated by one or more tabs, you&#8217;ll want to set FS like so:</p>
<p><code>FS="\t+" </code></p>
<p>Above, we use the special &#8220;+&#8221; regular expression character, which means &#8220;one or more of the previous character&#8221;.</p>
<p>If your fields are separated by whitespace (one or more spaces or tabs), you may be tempted to set FS to the following regular expression:</p>
<p><code>FS="[[:space:]+]" </code></p>
<p>While this assignment will do the trick, it&#8217;s not necessary. Why? Because by default, FS is set to a single space character, which awk interprets to mean &#8220;one or more spaces or tabs.&#8221; In this particular example, the default FS setting was exactly what you wanted in the first place!</p>
<p>Complex regular expressions are no problem. Even if your records are separated by the word &#8220;foo,&#8221; followed by three digits, the following regular expression will allow your data to be parsed properly:</p>
<p><code>FS="foo[0-9][0-9][0-9]" </code></p>
<p><strong>Number of fields</strong></p>
<p>The next two variables we&#8217;re going to cover are not normally intended to be written to, but are normally read and used to gain useful information about the input. The first is the NF variable, also called the &#8220;number of fields&#8221; variable. Awk will automatically set this variable to the number of fields in the current record. You can use the NF variable to display only certain input lines:</p>
<p><code>NF == 3 { print "this particular record has three fields: " $0 } </code></p>
<p>Of course, you can also use the NF variable in conditional statements, as follows:</p>
<p><code>{<br />
  if ( NF > 2 ) {<br />
          print $1 " " $2 ":" $3<br />
  }<br />
}<br />
</code></p>
<p><strong><br />
Record number</strong></p>
<p>The record number (NR) is another handy variable. It will always contain the number of the current record (awk counts the first record as record number 1). Up until now, we&#8217;ve been dealing with input files that contain one record per line. For these situations, NR will also tell you the current line number. However, when we start to process multi-line records later in the series, this will no longer be the case, so be careful! NR can be used like the NF variable to print only certain lines of the input:<br />
<code><br />
(NR < 10 ) || (NR > 100) { print "We are on record number 1-9 or 101+" } </code></p>
<p>Another example:</p>
<p><code>{<br />
  #skip header<br />
  if ( NR > 10 ) {<br />
          print "ok, now for the real information!"<br />
  }<br />
}<br />
</code></p>
<p>Awk provides additional variables that can be used for a variety of purposes. We&#8217;ll cover more of these variables in later articles. We&#8217;ve come to the end of our initial exploration of awk. As the series continues, I&#8217;ll demonstrate more advanced awk functionality, and we&#8217;ll end the series with a real-world awk application. In the meantime, if you&#8217;re eager to learn more, check out the resources listed below.</p>
<p><strong>Multi-line records</strong></p>
<p>Awk is an excellent tool for reading in and processing structured data, such as the system&#8217;s /etc/passwd file. /etc/passwd is the UNIX user database, and is a colon-delimited text file, containing a lot of important information, including all existing user accounts and user IDs, among other things. In my previous article, I showed you how awk could easily parse this file. All we had to do was to set the FS (field separator) variable to &#8220;:&#8221;.</p>
<p>By setting the FS variable correctly, awk can be configured to parse almost any kind of structured data, as long as there is one record per line. However, just setting FS won&#8217;t do us any good if we want to parse a record that exists over multiple lines. In these situations, we also need to modify the RS record separator variable. The RS variable tells awk when the current record ends and a new record begins.</p>
<p>As an example, let&#8217;s look at how we&#8217;d handle the task of processing an address list of Federal Witness Protection Program participants:</p>
<p><code>Jimmy the Weasel<br />
100 Pleasant Drive<br />
San Francisco, CA 12345<br />
Big Tony<br />
200 Incognito Ave.<br />
Suburbia, WA 67890</code></p>
<p>Ideally, we&#8217;d like awk to recognize each 3-line address as an individual record, rather than as three separate records. It would make our code a lot simpler if awk would recognize the first line of the address as the first field ($1), the street address as the second field ($2), and the city, state, and zip code as field $3. The following code will do just what we want:</p>
<p><code>BEGIN {<br />
	FS="\n"<br />
	RS=""<br />
}</code></p>
<p>Above, setting FS to &#8220;\n&#8221; tells awk that each field appears on its own line. By setting RS to &#8220;&#8221;, we also tell awk that each address record is separated by a blank line. Once awk knows how the input is formatted, it can do all the parsing work for us, and the rest of the script is simple. Let&#8217;s look at a complete script that will parse this address list and print out each address record on a single line, separating each field with a comma.<br />
<code><br />
BEGIN {<br />
	FS="\n"<br />
	RS=""<br />
}<br />
{<br />
	print $1 ", " $2 ", " $3<br />
}</code></p>
<p>If this script is saved as address.awk, and the address data is stored in a file called address.txt, you can execute this script by typing &#8220;awk -f address.awk address.txt&#8221;. This code produces the following output:<br />
<code><br />
Jimmy the Weasel, 100 Pleasant Drive, San Francisco, CA 12345<br />
Big Tony, 200 Incognito Ave., Suburbia, WA 67890</code></p>
<p><strong>OFS and ORS</strong></p>
<p>In address.awk&#8217;s print statement, you can see that awk concatenates (joins) strings that are placed next to each other on a line. We used this feature to insert a comma and a space (&#8220;, &#8220;) between the three address fields that appeared on the line. While this method works, it&#8217;s a bit ugly looking. Rather than inserting literal &#8220;, &#8221; strings between our fields, we can have awk do it for us by setting a special awk variable called OFS. Take a look at this code snippet.</p>
<p><code>print "Hello", "there", "Jim!"</code></p>
<p>The commas on this line are not part of the actual literal strings. Instead, they tell awk that &#8220;Hello&#8221;, &#8220;there&#8221;, and &#8220;Jim!&#8221; are separate fields, and that the OFS variable should be printed between each string. By default, awk produces the following output:</p>
<p><code>Hello there Jim!</code></p>
<p>This shows us that by default, OFS is set to &#8221; &#8220;, a single space. However, we can easily redefine OFS so that awk will insert our favorite field separator. Here&#8217;s a revised version of our original address.awk program that uses OFS to output those intermediate &#8220;, &#8221; strings:</p>
<p><code>BEGIN {<br />
	FS="\n"<br />
	RS=""<br />
	OFS=", "<br />
}<br />
{<br />
	print $1, $2, $3<br />
}<br />
</code></p>
<p>Awk also has a special variable called ORS, called the &#8220;output record separator&#8221;. By setting ORS, which defaults to a newline (&#8220;\n&#8221;), we can control the character that&#8217;s automatically printed at the end of a print statement. The default ORS value causes awk to output each new print statement on a new line. If we wanted to make the output double-spaced, we would set ORS to &#8220;\n\n&#8221;. Or, if we wanted records to be separated by a single space (and no newline), we would set ORS to &#8221; &#8220;.</p>
<p><strong><br />
Multi-line to tabbed</strong></p>
<p>Let&#8217;s say that we wrote a script that converted our address list to a single-line per record, tab-delimited format for import into a spreadsheet. After using a slightly modified version of address.awk, it would become clear that our program only works for three-line addresses. If awk encountered the following address, the fourth line would be thrown away and not printed:</p>
<p><code>Cousin Vinnie<br />
Vinnie's Auto Shop<br />
300 City Alley<br />
Sosueme, OR 76543</code></p>
<p>To handle situations like this, it would be good if our code took the number of records per field into account, printing each one in order. Right now, the code only prints the first three fields of the address. Here&#8217;s some code that does what we want:</p>
<p><code>BEGIN {<br />
    FS="\n"<br />
    RS=""<br />
    ORS=""<br />
} </p>
<p>{<br />
        x=1<br />
        while ( x<NF ) {<br />
                print $x "\t"<br />
                x++<br />
        }<br />
        print $NF "\n"<br />
}<br />
</code></p>
<p>First, we set the field separator FS to "\n" and the record separator RS to "" so that awk parses the multi-line addresses correctly, as before. Then, we set the output record separator ORS to "", which will cause the print statement to not output a newline at the end of each call. This means that if we want any text to start on a new line, we need to explicitly write print "\n".</p>
<p>In the main code block, we create a variable called x that holds the number of current field that we're processing. Initially, it's set to 1. Then, we use a while loop (an awk looping construct identical to that found in the C language) to iterate through all but the last record, printing the record and a tab character. Finally, we print the last record and a literal newline; again, since ORS is set to "", print won't output newlines for us. Program output looks like this, which is exactly what we wanted:</p>
<p>Our intended output. Not pretty, but tab delimited for easy import into a spreadsheet</p>
<p><code>Jimmy the Weasel        100 Pleasant Drive      San Francisco, CA 12345<br />
Big Tony        200 Incognito Ave.      Suburbia, WA 67890<br />
Cousin Vinnie   Vinnie's Auto Shop      300 City Alley  Sosueme, OR 76543</code></p>
<p><strong>Looping constructs</strong></p>
<p>We've already seen awk's while loop construct, which is identical to its C counterpart. Awk also has a "do...while" loop that evaluates the condition at the end of the code block, rather than at the beginning like a standard while loop. It's similar to "repeat...until" loops that can be found in other languages. Here's an example:</p>
<p>do...while example</p>
<p><strong>{<br />
	count=1<br />
	do {<br />
		print "I get printed at least once no matter what"<br />
	} while ( count != 1 )<br />
}</strong></p>
<p>Because the condition is evaluated after the code block, a "do...while" loop, unlike a normal while loop, will always execute at least once. On the other hand, a normal while loop will never execute if its condition is false when the loop is first encountered.</p>
<p>for loops<br />
Awk allows you to create for loops, which like while loops are identical to their C counterpart:</p>
<p><code>for ( initial assignment; comparison; increment ) {<br />
	code block<br />
}<br />
</code></p>
<p>Here's a quick example:</p>
<p><code>for ( x = 1; x <= 4; x++ ) {<br />
	print "iteration",x<br />
}<br />
</code></p>
<p>This snippet will print:<br />
<code><br />
iteration 1<br />
iteration 2<br />
iteration 3<br />
iteration 4</code></p>
<p><strong>Break and continue<br />
</strong><br />
Again, just like C, awk provides break and continue statements. These statements provide better control over awk's various looping constructs. Here's a code snippet that desperately needs a break statement:</p>
<p><code>while (1) {<br />
	print "forever and ever..."<br />
}<br />
</code></p>
<p>Because 1 is always true, this while loop runs forever. Here's a loop that only executes ten times:</p>
<p><code>x=1<br />
while(1) {<br />
	print "iteration",x<br />
	if ( x == 10 ) {<br />
		break<br />
	}<br />
	x++<br />
}</code></p>
<p>Here, the break statement is used to "break out" of the innermost loop. "break" causes the loop to immediately terminate and execution to continue at the line after the loop's code block.</p>
<p>The continue statement complements break, and works like this:</p>
<p><code>x=1<br />
while (1) {<br />
	if ( x == 4 ) {<br />
		x++<br />
		continue<br />
	}<br />
	print "iteration",x<br />
	if ( x > 20 ) {<br />
		break<br />
	}<br />
	x++<br />
}</code></p>
<p>This code will print "iteration 1" through "iteration 21", except for "iteration 4". If iteration equals 4, x is incremented and the continue statement is called, which immediately causes awk to start to the next loop iteration without executing the rest of the code block. The continue statement works for every kind of awk iterative loop, just as break does. When used in the body of a for loop, continue will cause the loop control variable to be automatically incremented. Here's an equivalent for loop:</p>
<p><code>for ( x=1; x<=21; x++ ) {<br />
	if ( x == 4 ) {<br />
		continue<br />
	}<br />
	print "iteration",x<br />
}</code></p>
<p>It wasn't necessary to increment x just before calling continue as it was in our while loop, since the for loop increments x automatically.</p>
<p><strong>Arrays</strong></p>
<p>You'll be pleased to know that awk has arrays. However, under awk, it's customary to start array indices at 1, rather than 0:<br />
<code><br />
myarray[1]="jim"<br />
myarray[2]=456</code></p>
<p>When awk encounters the first assignment, myarray is created and the element myarray[1] is set to "jim". After the second assignment is evaluated, the array has two elements.</p>
<p>Iterating over arrays<br />
Once defined, awk has a handy mechanism to iterate over the elements of an array, as follows:</p>
<p><code>for ( x in myarray ) {<br />
	print myarray[x]<br />
}<br />
</code></p>
<p>This code will print out every element in the array myarray. When you use this special "in" form of a for loop, awk will assign every existing index of myarray to x (the loop control variable) in turn, executing the loop's code block once after each assignment. While this is a very handy awk feature, it does have one drawback -- when awk cycles through the array indices, it doesn't follow any particular order. That means that there's no way for us to know whether the output of above code will be:</p>
<p><code>jim<br />
456</code></p>
<p>or</p>
<p><code>456<br />
jim</code></p>
<p>To loosely paraphrase Forrest Gump, iterating over the contents of an array is like a box of chocolates -- you never know what you're going to get. This has something to do with the "stringiness" of awk arrays, which we'll now take a look at.</p>
<p><strong>Array index stringiness</strong></p>
<p>In my previous article, I showed you that awk actually stores numeric values in a string format. While awk performs the necessary conversions to make this work, it does open the door for some odd-looking code:<br />
<code><br />
a="1"<br />
b="2"<br />
c=a+b+3</code></p>
<p>After this code executes, c is equal to 6. Since awk is "stringy", adding strings "1" and "2" is functionally no different than adding the numbers 1 and 2. In both cases, awk will successfully perform the math. Awk's "stringy" nature is pretty intriguing -- you may wonder what happens if we use string indexes for arrays. For instance, take the following code:<br />
<code><br />
myarr["1"]="Mr. Whipple"<br />
print myarr["1"]<br />
</code></p>
<p>As you might expect, this code will print "Mr. Whipple". But how about if we drop the quotes around the second "1" index?</p>
<p><code>myarr["1"]="Mr. Whipple"<br />
print myarr[1]</code></p>
<p>Guessing the result of this code snippet is a bit more difficult. Does awk consider myarr["1"] and myarr[1] to be two separate elements of the array, or do they refer to the same element? The answer is that they refer to the same element, and awk will print "Mr. Whipple", just as in the first code snippet. Although it may seem strange, behind the scenes awk has been using string indexes for its arrays all this time!</p>
<p>After learning this strange fact, some of us may be tempted to execute some wacky code that looks like this:</p>
<p><code>myarr["name"]="Mr. Whipple"<br />
print myarr["name"]<br />
</code></p>
<p>Not only does this code not raise an error, but it's functionally identical to our previous examples, and will print "Mr. Whipple" just as before! As you can see, awk doesn't limit us to using pure integer indexes; we can use string indexes if we want to, without creating any problems. Whenever we use non-integer array indices like myarr["name"], we're using associative arrays. Technically, awk isn't doing anything different behind the scenes than when we use a string index (since even if you use an "integer" index, awk still treats it as a string). However, you should still call 'em associative arrays -- it sounds cool and will impress your boss. The stringy index thing will be our little secret. <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p><strong>Array tools</strong></p>
<p>When it comes to arrays, awk gives us a lot of flexibility. We can use string indexes, and we aren't required to have a continuous numeric sequence of indices (for example, we can define myarr[1] and myarr[1000], but leave all other elements undefined). While all this can be very helpful, in some circumstances it can create confusion. Fortunately, awk offers a couple of handy features to help make arrays more manageable.</p>
<p>First, we can delete array elements. If you want to delete element 1 of your array fooarray, type:</p>
<p><code>delete fooarray[1]</code></p>
<p>And, if you want to see if a particular array element exists, you can use the special "in" boolean operator as follows:</p>
<p><code>if ( 1 in fooarray ) {<br />
	print "Ayep!  It's there."<br />
} else {<br />
	print "Nope!  Can't find it."<br />
}<br />
</code></p>
<p><strong>Formatting output</strong></p>
<p>While awk's print statement does do the job most of the time, sometimes more is needed. For those times, awk offers two good old friends called printf() and sprintf(). Yes, these functions, like so many other awk parts, are identical to their C counterparts. printf() will print a formatted string to stdout, while sprintf() returns a formatted string that can be assigned to a variable. If you're not familiar with printf() and sprintf(), an introductory C text will quickly get you up to speed on these two essential printing functions. You can view the printf() man page by typing "man 3 printf" on your Linux system.</p>
<p>Here's some sample awk sprintf() and printf() code. As you can see, everything looks almost identical to C.</p>
<p><code>x=1<br />
b="foo"<br />
printf("%s got a %d on the last test\n","Jim",83)<br />
myout=("%s-%d",b,x)<br />
print myout</code></p>
<p>This code will print:</p>
<p><code>Jim got a 83 on the last test<br />
foo-1</code></p>
<p><strong>String functions</strong></p>
<p>Awk has a plethora of string functions, and that's a good thing. In awk, you really need string functions, since you can't treat a string as an array of characters as you can in other languages like C, C++, and Python. For example, if you execute the following code:</p>
<p><code>mystring="How are you doing today?"<br />
print mystring[3]<br />
</code></p>
<p>You'll receive an error that looks something like this:</p>
<p><code>awk: string.gawk:59: fatal: attempt to use scalar as array</code></p>
<p>Oh, well. While not as convenient as Python's sequence types, awk's string functions get the job done. Let's take a look at them.</p>
<p>First, we have the basic length() function, which returns the length of a string. Here's how to use it:</p>
<p><code>print length(mystring)</code></p>
<p>This code will print the value:<br />
<code><br />
24</code></p>
<p>OK, let's keep going. The next string function is called index, and will return the position of the occurrence of a substring in another string, or it will return 0 if the string isn't found. Using mystring, we can call it this way:</p>
<p><code>print index(mystring,"you")</code></p>
<p>Awk prints:</p>
<p><code>9</code></p>
<p>We move on to two more easy functions, tolower() and toupper(). As you might guess, these functions will return the string with all characters converted to lowercase or uppercase respectively. Notice that tolower() and toupper() return the new string, and don't modify the original. This code:</p>
<p><code>print tolower(mystring)<br />
print toupper(mystring)<br />
print mystring<br />
</code></p>
<p>....will produce this output:</p>
<p><code>how are you doing today?<br />
HOW ARE YOU DOING TODAY?<br />
How are you doing today?</code></p>
<p>So far so good, but how exactly do we select a substring or even a single character from a string? That's where substr() comes in. Here's how to call substr():</p>
<p><code>mysub=substr(mystring,startpos,maxlen)</code></p>
<p>mystring should be either a string variable or a literal string from which you'd like to extract a substring. startpos should be set to the starting character position, and maxlen should contain the maximum length of the string you'd like to extract. Notice that I said maximum length; if length(mystring) is shorter than startpos+maxlen, your result will be truncated. substr() won't modify the original string, but returns the substring instead. Here's an example:</p>
<p><code>print substr(mystring,9,3)</code></p>
<p>Awk will print:</p>
<p><code>you<br />
</code></p>
<p>If you regularly program in a language that uses array indices to access parts of a string (and who doesn't), make a mental note that substr() is your awk substitute. You'll need to use it to extract single characters and substrings; because awk is a string-based language, you'll be using it often.</p>
<p>Now, we move on to some meatier functions, the first of which is called match(). match() is a lot like index(), except instead of searching for a substring like index() does, it searches for a regular expression. The match() function will return the starting position of the match, or zero if no match is found. In addition, match() will set two variables called RSTART and RLENGTH. RSTART contains the return value (the location of the first match), and RLENGTH specifies its span in characters (or -1 if no match was found). Using RSTART, RLENGTH, substr(), and a small loop, you can easily iterate through every match in your string. Here's an example match() call:</p>
<p><code>print match(mystring,/you/), RSTART, RLENGTH<br />
</code></p>
<p>Awk will print:</p>
<p><code>9 9 3</code></p>
<p><strong>String substitution</strong></p>
<p>Now, we're going to look at a couple of string substitution functions, sub() and gsub(). These guys differ slightly from the functions we've looked at so far in that they actually modify the original string. Here's a template that shows how to call sub():</p>
<p><code>sub(regexp,replstring,mystring)</code></p>
<p>When you call sub(), it'll find the first sequence of characters in mystring that matches regexp, and it'll replace that sequence with replstring. sub() and gsub() have identical arguments; the only way they differ is that sub() will replace the first regexp match (if any), and gsub() will perform a global replace, swapping out all matches in the string. Here's an example sub() and gsub() call:</p>
<p><code>sub(/o/,"O",mystring)<br />
print mystring<br />
mystring="How are you doing today?"<br />
gsub(/o/,"O",mystring)<br />
print mystring</code></p>
<p>We had to reset mystring to its original value because the first sub() call modified mystring directly. When executed, this code will cause awk to output:</p>
<p><code>HOw are you doing today?<br />
HOw are yOu dOing tOday?</code></p>
<p>Of course, more complex regular expressions are possible. I'll leave it up to you to test out some complicated regexps.</p>
<p>We wrap up our string function coverage by introducing you to a function called split(). split()'s job is to "chop up" a string and place the various parts into an integer-indexed array. Here's an example split() call:</p>
<p><code>numelements=split("Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec",mymonths,",")</code></p>
<p>When calling split(), the first argument contains the literal string or string variable to be chopped. In the second argument, you should specify the name of the array that split() will stuff the chopped parts into. In the third element, specify the separator that will be used to chop the strings up. When split() returns, it'll return the number of string elements that were split. split() assigns each one to an array index starting with one, so the following code:<br />
<code><br />
print mymonths[1],mymonths[numelements]</code></p>
<p>....will print:</p>
<p><code>Jan Dec</code></p>
<p><strong>Special string forms</strong></p>
<p>A quick note -- when calling length(), sub(), or gsub(), you can drop the last argument and awk will apply the function call to $0 (the entire current line). To print the length of each line in a file, use this awk script:</p>
<p><code><br />
{<br />
	print length()<br />
}</p>
<p></code><br />
Back to top</p>
<p><strong>Financial fun</strong></p>
<p>A few weeks ago, I decided to write my own checkbook balancing program in awk. I decided that I'd like to have a simple tab-delimited text file into which I can enter my most recent deposits and withdrawals. The idea was to hand this data to an awk script that would automatically add up all the amounts and tell me my balance. Here's how I decided to record all my transactions into my "ASCII checkbook":</p>
<p><code>23 Aug 2000	food	-	-	Y	Jimmy's Buffet		30.25</code></p>
<p>Every field in this file is separated by one or more tabs. After the date (field 1, $1), there are two fields called "expense category" and "income category". When I'm entering an expense like on the above line, I put a four-letter nickname in the exp field, and a "-" (blank entry) in the inc field. This signifies that this particular item is a "food expense" <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Here's what a deposit looks like:</p>
<p><code>23 Aug 2000	-	inco	-	Y	Boss Man		2001.00</code></p>
<p>In this case, I put a "-" (blank) in the exp category, and put "inco" in the inc category. "inco" is my nickname for generic (paycheck-style) income. Using category nicknames allows me to generate a breakdown of my income and expenditures by category. As far as the rest of the records, all the other fields are fairly self-explanatory. The cleared? field ("Y" or "N") records whether the transaction has been posted to my account; beyond that, there's a transaction description, and a positive dollar amount.</p>
<p>The algorithm used to compute the current balance isn't too hard. Awk simply needs to read in each line, one by one. If an expense category is listed but there is no income category (it's "-"), then this item is a debit. If an income category is listed, but no expense category ("-") is there, then the dollar amount is a credit. And, if there is both an expense and income category listed, then this amount is a "category transfer"; that is, the dollar amount will be subtracted from the expense category and added to the income category. Again, all these categories are virtual, but are very useful for tracking income and expenditures, as well as for budgeting.</p>
<p><strong>The code</strong></p>
<p>Time to look at the code. We'll start off with the first line, the BEGIN block and a function definition:</p>
<p>balance, part 1</p>
<p><code><br />
#!/usr/bin/env awk -f<br />
BEGIN {<br />
	FS="\t+"<br />
	months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"<br />
}</p>
<p>function monthdigit(mymonth) {<br />
	return (index(months,mymonth)+3)/4<br />
}<br />
</code></p>
<p>Adding the first "#!..." line to any awk script will allow it to be directly executed from the shell, provided that you "chmod +x myscript" first. The remaining lines define our BEGIN block, which gets executed before awk starts processing our checkbook file. We set FS (the field separator) to "\t+", which tells awk that the fields will be separated by one or more tabs. In addition, we define a string called months that's used by our monthdigit() function, which appears next.</p>
<p>The last three lines show you how to define your own awk function. The format is simple -- type "function", then the function name, and then the parameters separated by commas, inside parentheses. After this, a "{ }" code block contains the code that you'd like this function to execute. All functions can access global variables (like our months variable). In addition, awk provides a "return" statement that allows the function to return a value, and operates similarly to the "return" found in C, Python, and other languages. This particular function converts a month name in a 3-letter string format into its numeric equivalent. For example, this:</p>
<p><code>print monthdigit("Mar")<br />
</code></p>
<p>....will print this:</p>
<p><code>3</code></p>
<p>Now, let's move on to some more functions.</p>
<p><strong>Financial functions</strong></p>
<p>Here are three more functions that perform the bookkeeping for us. Our main code block, which we'll see soon, will process each line of the checkbook file sequentially, calling one of these functions so that the appropriate transactions are recorded in an awk array. There are three basic kinds of transactions, credit (doincome), debit (doexpense) and transfer (dotransfer). You'll notice that all three functions accept one argument, called mybalance. mybalance is a placeholder for a two-dimensional array, which we'll pass in as an argument. Up until now, we haven't dealt with two-dimensional arrays; however, as you can see below, the syntax is quite simple. Just separate each dimension with a comma, and you're in business.</p>
<p>We'll record information into "mybalance" as follows. The first dimension of the array ranges from 0 to 12, and specifies the month, or zero for the entire year. Our second dimension is a four-letter category, like "food" or "inco"; this is the actual category we're dealing with. So, to find the entire year's balance for the food category, you'd look in mybalance[0,"food"]. To find June's income, you'd look in mybalance[6,"inco"].</p>
<p>balance, part 2</p>
<p><code>function doincome(mybalance) {<br />
	mybalance[curmonth,$3] += amount<br />
	mybalance[0,$3] += amount<br />
}</p>
<p>function doexpense(mybalance) {<br />
	mybalance[curmonth,$2] -= amount<br />
	mybalance[0,$2] -= amount<br />
}</p>
<p>function dotransfer(mybalance) {<br />
	mybalance[0,$2] -= amount<br />
	mybalance[curmonth,$2] -= amount<br />
	mybalance[0,$3] += amount<br />
	mybalance[curmonth,$3] += amount<br />
}<br />
</code></p>
<p>When doincome() or any of the other functions are called, we record the transaction in two places -- mybalance[0,category] and mybalance[curmonth, category], the entire year's category balance and the current month's category balance, respectively. This allows us to easily generate either an annual or monthly breakdown of income/expenditures later on.</p>
<p>If you look at these functions, you'll notice that the array referenced by mybalance is passed in my reference. In addition, we also refer to several global variables: curmonth, which holds the numeric value of the month of the current record, $2 (the expense category), $3 (the income category), and amount ($7, the dollar amount). When doincome() and friends are called, all these variables have already been set correctly for the current record (line) being processed.</p>
<p><strong>The main block</strong></p>
<p>Here's the main code block that contains the code that parses each line of input data. Remember, because we have set FS correctly, we can refer to the first field as $1, the second field as $2, etc. When doincome() and friends are called, the functions can access the current values of curmonth, $2, $3 and amount from inside the function. Take a look at the code and meet me on the other side for an explanation.</p>
<p>balance, part 3</p>
<p><code>			</p>
<p>{<br />
	curmonth=monthdigit(substr($1,4,3))<br />
	amount=$7</p>
<p>	#record all the categories encountered<br />
	if ( $2 != "-" )<br />
		globcat[$2]="yes"<br />
	if ( $3 != "-" )<br />
		globcat[$3]="yes"</p>
<p>	#tally up the transaction properly<br />
	if ( $2 == "-" ) {<br />
		if ( $3 == "-" ) {<br />
			print "Error: inc and exp fields are both blank!"<br />
			exit 1<br />
		} else {<br />
			#this is income<br />
			doincome(balance)<br />
			if ( $5 == "Y" )<br />
				doincome(balance2)<br />
		}<br />
	} else if ( $3 == "-" ) {<br />
		#this is an expense<br />
		doexpense(balance)<br />
		if ( $5 == "Y" )<br />
			doexpense(balance2)<br />
	} else {<br />
		#this is a transfer<br />
		dotransfer(balance)<br />
		if ( $5 == "Y" )<br />
			dotransfer(balance2)<br />
	}<br />
}<br />
</code></p>
<p>In the main block, the first two lines set curmonth to an integer between 1 and 12, and set amount to field 7 (to make the code easier to understand). Then, we have four interesting lines, where we write values into an array called globcat. globcat, or the global categories array, is used to record all those categories encountered in the file -- "inco", "misc", "food", "util", etc. For example, if $2 == "inco", we set globcat["inco"] to "yes". Later on, we can iterate through our list of categories with a simple "for (x in globcat)" loop.</p>
<p>On the next twenty or so lines, we analyze fields $2 and $3, and record the transaction appropriately. If $2=="-" and $3!="-", we have some income, so we call doincome(). If the situation is reversed, we call doexpense(); and if both $2 and $3 contain categories, we call dotransfer(). Each time, we pass the "balance" array to these functions so that the appropriate data is recorded there.</p>
<p>You'll also notice several lines that say "if ( $5 == "Y" ), record that same transaction in balance2". What exactly are we doing here? You'll recall that $5 contains either a "Y" or a "N", and records whether the transaction has been posted to the account. Because we record the transaction to balance2 only if the transaction has been posted, balance2 will contain the actual account balance, while "balance" will contain all transactions, whether they have been posted or not. You can use balance2 to verify your data entry (since it should match with your current account balance according to your bank), and use "balance" to make sure that you don't overdraw your account (since it will take into account any checks you have written that have not yet been cashed).</p>
<p><strong>Generating the report</strong></p>
<p>After the main block repeatedly processes each input record, we now have a fairly comprehensive record of debits and credits broken down by category and by month. Now, all we need to do is define an END block that will generate a report, in this case a modest one:</p>
<p><code>END {<br />
	bal=0<br />
	bal2=0<br />
	for (x in globcat) {<br />
		bal=bal+balance[0,x]<br />
		bal2=bal2+balance2[0,x]<br />
    	}<br />
    	printf("Your available funds: %10.2f\n", bal)<br />
    	printf("Your account balance: %10.2f\n", bal2)<br />
}</code></p>
<p>This report prints out a summary that looks something like this:</p>
<p><code>Your available funds:    1174.22<br />
Your account balance:    2399.33</code></p>
<p>In our END block, we used the "for (x in globcat)" construct to iterate through every category, tallying up a master balance based on all the transactions recorded. We actually tally up two balances, one for available funds, and another for the account balance. To execute the program and process your own financial goodies that you've entered into a file called "mycheckbook.txt", put all the above code into a text file called "balance", "chmod +x balance", and then type "./balance mycheckbook.txt". The balance script will then add up all your transactions and print out a two-line balance summary for you</p>
<p><strong>Upgrades</strong></p>
<p>I use a more advanced version of this program to manage my personal and business finances. My version (which I couldn't include here due to space limitations) prints out a monthly breakdown of income and expenses, including annual totals, net income and a bunch of other stuff. Even better, it outputs the data in HTML format, so that I can view it in a Web browser <img src='http://www.mediabandit.co.uk/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  If you find this program useful, I encourage you to add these features to this script. You won't need to configure it to record any additional information; all the information you need is already in balance and balance2. Just upgrade the END block, and you're in business!</p>
<p>I hope you've enjoyed this tutorial it was written by Daniel Robbins not written by us, but we yoinked a copy cos it is so good and we wanted a local copy but full credit goes to Daniel Robbins and the originals are http://www.ibm.com/developerworks/library/l-awk1.html http://www.ibm.com/developerworks/library/l-awk2.html http://www.ibm.com/developerworks/library/l-awk3.html</p>
<p>Residing in Albuquerque, New Mexico, Daniel Robbins is the President/CEO of Gentoo Technologies, Inc., the creator of Gentoo Linux, an advanced Linux for the PC, and the Portage system, a next-generation ports system for Linux. He has also served as a contributing author for the Macmillan books Caldera OpenLinux Unleashed, SuSE Linux Unleashed, and Samba Unleashed. Daniel has been involved with computers in some fashion since the second grade, when he was first exposed to the Logo programming language as well as a potentially dangerous dose of Pac Man. This probably explains why he has since served as a Lead Graphic Artist at SONY Electronic Publishing/Psygnosis. Daniel enjoys spending time with his wife, Mary, and his new baby daughter, Hadassah. You can reach Daniel at drobbins@gentoo.org. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.mediabandit.co.uk/blog/222_awk-by-example/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
