How I reached Backup/Sync Nirvana

For quite a while (read “way too long”)  I’ve relied on a simple batch script that uses xcopy to throw my file up on to my NAS box. This had some pretty significant drawbacks- one being that every time the script ran it wrongly determined the age of the source files to be newer that those on the NAS box and would copy all 4ooGb up again, taking forever. Another serious issue was the lack of deleting files on the NAS when the ceased to exist on the source PC, meaning the NAS slowly filled up as I moved, deleted and renamed files on my PC.

Windows Vista and 7, XCopy has been replaced with Robocopy. Robocopy is a slightly better XCopy which supports deleting files on the target when they no longer exist on the source. Sadly, Robocopy still had serious issues understanding the file ages and still copied everything each time it was run.

Robocopy has some more advanced cousins, such as RobocopyGUI and RichCopy but by this point my patience with Microsoft file copying solutions was over and I decided to give some freeware a try.

First up was AllwaySync. This is a perfectly capable windows tools which a decent UI but the free version limits the amount of data it will sync each week. This meant that one weeks editing particularly large files would mean other folders couldn’t be synced- a crippling issue as far as I was concerned. The paid version wasn’t too expensive but I decided to be cheap and press on for a free solution. For the record, I like supporting small developers and will often pay for pro versions, just not when the software does something I consider too trivial to warrant a paid solution!

I was pointed to Syncback. It has a free version and, whilst not being the prettiest software in the world, it does a fantastic job of keeping folders synced up with tons of options. It can automatically sign in to your protected SMB shares, schedule backups and has a lot of options to get it to do just what you want. I’m currently using it for my windows back up and can’t fault it. Recommended for those times where Windows is pretty much the only choice.

There is something  I prefer to Syncback when using Linux though: rsync. rsync is a popular and widely used tool for both one off synchronisations and regular backup jobs; it is used in industry for some hefty syncing and, when combined with a cron job, becomes the ultimate way of syncing large amounts of data short of a hardware or file system solution. I won’t go into the setting up of it now but there is plenty of tutorials (this one is especially good), well worth looking into if you have a lot of stuff to sync/copy/move.

One last honorable mention is the rsync extension, rsnapshot. This couples rsync to some configurable magic around making and maintaining seperate snapshots of your data at various times. If you have the space to keep multiple back ups, you really can’t ask for a nicer solution than rsnapshot.


Quick Tip: Lines of Code in sub-folders

Just a quick one- I searched around for a bit longer than I wanted to find out how to find out the number of lines in certain file types in a folder.

The solution was:

find -type f \( -iname "*.java" -o -iname "*.gradle" \) -exec wc -l {} \; | awk '{lines += $1 ; files += 1 ; print  }; END { print "Lines total is: ", lines ," in ", files ," files"}'

As you can see, its a find for all files with names matching the *.java and *.gradle regexs, then a little awk magic to neaten up the out put and display totals at the end.

Networking VirtualBox VM’s in 7 Easy Steps

I love working on an Ubuntu virtual machine run on a Windows 7 host; I use this set up both at work and at home and will no doubt espouse the benefits of it on the blog at some point (until then check out this post by Ryan).

My software of choice is the free and excellent VirtualBox by Sun Oracle, which can be found here. Much of my work requires me write deployment scripts which set things up on multiple servers. To test this I found it really useful to set up a network between all of my VM’s, which isn’t as straightforward as it could be. Here’s how I got it going:

  1. Shutdown all VM’s, exit VirtualBox and open a command line
  2. Change directory into your VirtualBox folder (it should contain VBoxManage.exe)
  3. Execute the following commands, where vm01 and vm02 are the VirtualBox names of the VM’s:
  4. VBoxManage modifyvm “vm01” –nic2 intnet

    VBoxManage modifyvm “vm01” –intnet2 intnet

    VBoxManage modifyvm “vm02” –intnet2 intnet

    VBoxManage modifyvm “vm02” –nic2 intnet

    This configures a 2nd NIC on each VM, connected to the “intnet” internal network.

  5. Launch each VM
  6. Edit VM01’s network config, assigning a static IP (eg and adding the default gateway as the same as the gateway on NIC1 (the original NIC)
  7. Edit VM02’s network config with the same details but with a different IP on the same group & subnet (eg
  8. Try to get the 2 machines to ping one another, they should be working fine now!

Spa2010: TDD at the System Scale

Spa2010 is a conference for of people from all over the software community, run (and named after) the BCS special interest group for Software Professional Advancement. The conference is known for its rich mix of topics and heavy bias towards hands-on sessions, and annually takes on 2 forms-  the main 3 day event in London and MiniSpa, a free one day “Best Of Spa” compilation where the highest rated sessions are run again. This year I was lucky enough to attend the full conference thanks to some training budget I found, so I headed off to meet some hear some very interesting people run sessions on everything from testing to behaviours in civil architecture which could be applied in software engineering.

The first talk I attended was by Nat Pryce and Steve Freeman, co-authors of “Growing Object-Oriented Software, Guided by Tests”, who ran an interesting session on how to write and test (or should that be “test and write”- test first!) large scale systems. The session started with a great overview of the important lessons learnt and went on to some practical exercises based on the attendees hunting down testing issues. Here is a bit of what I learnt from it all:

Testing from the boundaries inwards, not centre-out

The biggest point made for me personally was that it is too easy to start by writing tests and implementations for our domain classes and settle for extensive unit test coverage without giving much thought to testing at larger scales.

Using only TDD at the unit scale means that we end up defining the “idealised” version of each unit- our domain classes define what we wish our domains looked like, our processing methods reflect how these perfect domains would be manipulated and so on until we hit the outer boundary of our program… where we end up with a rift between the ideal model we’ve built and the reality we need to integrate with.

The solution is simply to not start with our low level unit tests first, but to broadly scope and write tests at the limits of our system. This ensures we have some appreciation of the boundaries of our software and means we start working with the API’s we need to integrate to at the start. Any restrictions and needs placed on us by these API’s are considered from the beginning of development instead of as the last step when the pressure to deliver is on.

Use Anti-Corruption Layers or Simplicators

Another important point made was that we can lessen the impact of imperfect API’s by abstracting them behind a “simplicator” or “anti-corruption” layer- basically a small, separate project which consumes the ugly/unwieldy API’s we need to integrate with and exposes a more ideal API for our system to consume.

This also has the effect of largely de-coupling the implementation of our system from the specific system we are building against. An advantage I saw in this is that bad design can be isolated and removed when projects are improved or replaced; the issues in project A need not affect project B when it integrates with A; this also possibly avoids the issues being forced into the design for the replacement for A when it needs to ensure compatibility with B.

Testing Asynchronous Systems

The session went on to some excellent examples of tests which flickered (passed or failed somewhat randomly) and explored issues of false-positives and false-negatives caused in an asynchronous system. The lessons learnt from this, in summary were:

If some state changes in your asynchronous system, make damn sure that your tests are synchronised in some way:polling the state just doesn’t cut it. Sometimes the test will miss the change by testing the state before it has happened. Adding a wait or thread sleep might work, but it makes your tests take longer to run than they should. Polling after the state has changed, then been changed back again looks like a failure despite the correct behaviour being played out.

The session was crammed with useful tips and examples, I strongly recommend going to it if you get a chance at another event; it certainly makes a lot of testing and design issues known which I am certain I’d have wondered into over the next couple of years. Many thanks to Nat and Steve for all the time and effort that went into it!

Programmers are not Typists

Programmers are not Typists

– Overheard at Spa2010

I love this quote- a nice succinct way to point out that both programmers should not be considered semi-skilled or easily replaceable but it also serves as a strong reminder to the programming community that much of our job lies away from the keyboard; Good design and customer engagement are vital to really be a great software developer. The temptation to just hear the problem and then immediately start bashing out code must be fought.

Also, as pointed out on twitter by @lassekoskela, it would have been better to say “Programmers are not tpyists!”