07 September 2011

Data Mining

As I mentioned in one of my previous posts, I am not only going to write about development process, but also to present other useful stuff like tutorials or algorithms.

Today this will be a link to Statistical Data Mining Tutorials by Andrew Moore.

Why Data Mining and Machine Learning are useful? It's stuff for those who deal with data to discover some knowledge not apparently visible or obvious - either to predict future stock or just to select best ads to show to a visitor on your site.

Labels: ,

31 August 2011

Requirements I

As discussed in my introductory posts, requirements engineering is one of the most important factors of projects' success (or failure). If we take a look at The Standish Group CHAOS reports (well you have to have $99 to spend for a recent copy; some old reports are available for free "for academic purposes") we will see, that over 44% of failures occur due to lack of proper requirements management (i.e. lack of user input, incomplete requirements, unrealistic expectations, changing specification etc.). Another  26% are related problems like: lack of proper planning, lack of resources (how can you plan activities and manage your resources, time, and money if you don't know what really needs to be done? How do you create budget and set up price the customer is going to pay?). Surprisingly almost 8% of project are terminated because the customer does not need it any longer - a problem also related to requirements specification, as if they were performed properly in the beginning, many such instances would be detected much earlier.

But what requirements really are? The short answer is that requirements define what users, customers, suppliers, and business routines (lets call them stakeholders for now) need from the system, so it is useful and helpful for them. However, be careful as requirements from different stakeholders may be contradicting, ambiguous, or incomplete!

Why do we need requirements in a first place? As mentioned before - we need them for planning and risk management. We also need them to do proper acceptance testing, to manage trade-offs to be made (and justify them!) and to manage change within the organization that is going to use the deployed system.

We cannot think solely about the software when we start to collect our requirements. Lets imagine, that we are building a very complicated web application. The application is going to deliver a vast amount of data to some other systems around the globe. The application alone is not able to perform the task - if the server and uplink are to slow, than even the best application will not be able to send the data as quickly as it needs to be. Similarly if the third party systems are expecting different format than we send they will not be able to use the data. As we see the correct result - requirement - can be achieved only if we think about the whole system - software, hardware, and human along with interactions between them.

Very important thing to remember is, that requirements management is not a single step in development (like often shown by waterfall model), but rather they are constantly involved in all phases of the project: planning, development, testing etc. Requirements also help to track the progress of the project - not only by developers, but also management staff. They help understanding (at fairly high-level) what is developed and in many cases might help reuse of components across projects, coordinate activities, or share resources.

Another thing to consider is that business requirements (usually what our stakeholders talk about) is not equal to engineering requirements (what we need to develop a system). In general we can say that there are at least 3 layers of requirements analysis. First stakeholders view (their expectations and objectives) are translated into set of stakeholders requirements - defining what stakeholders want to achieve by having the system. In this step we are focusing on the problem domain, usually trying to avoid any particular solution. In next step we translate these requirements into set of system requirements - defining (at high-level, without specific details) how the system will meet the stakeholders needs. Than we design the system effectively translating the system requirements into architecture of the system. At this step we define how this specific design will meet stakeholders needs. The 2 latter steps are expressed in the solution domain.

For example lets consider a case, when a stakeholder is in need of a personal safety system for a excursion vessel. They might express their need by saying that they need means of ensuring personal safety should he vessel sunk, yet the item their customers have to wear while boarding should not impair their comfort and ability to move freely. At the system requirements level we will consider all possible solutions such as vests, life rings, or Personal Anti-Gravity Module (given it was already invented) to pick one solution that will best suit the stakeholder needs. We will discuss how this solution is going to fulfil stakeholder needs, but we will not go to much into design details. At the design level we will think about this specific solution, and design an instance of this solution - like a specific dimensions, material, shape, and colour of the vest (or life ring, or PAGM).

Moreover in many cases we need to translate high level requirements (goals or expectations) in a set of low level requirements. The relations are often quite complicated (many-to-many), hence understanding of relationships between different layers (it is called traceability) of information is needed.

It is often the case, that stakeholders not only specify their needs in terms of a particular solution - usually one of more IT literal person has read something about this new cool thing called 'peer-to-peer' and they want it even though they do not know what it means, but also focus on very low level issues such as colour of buttons and their shape, rather than clearly state their needs and goals to achieve. In such case the requirements engineering should lead to justification of particular demands (maybe simple client-server approach would suffice?), or we will not be able to find the most optimal solution. Also a clear distinction of solution and problem domains needs to be made. Otherwise the discussion will be taken over by developers, as the only description will be really done in solution domain, resulting in lack of deep understanding of the goals to be achieved and functionality to include in the system.

In a summary I will give an example of things going slightly wrong because of unclear specification. A few years ago I was leading development of an auction system. The initial requirement was to have possibility to bid on items until the given deadline. After some time the requirement changed, so that if the bidding was done within last 30 seconds, than the deadline is extended by additional 30 seconds (effectively the more people bid, the longer the auction). Connected with that was demand of having a real time clock displayed on the bidding page showing time to end of the auction. Everything was OK at this point. However later the requirement changed again to have possibility to have 30, 15 and 5 seconds extension times - and here the problem started to be visible. The typical round-trip time to server the customer had was over 600ms, moreover the load of the server was enormous due to number of bidders (+ overhead from the clock update). The customer wanted the clock to be precise (means he had 4 laptops on one desk and he wanted all the clocks to be counting EXACTLY the same with millisecond agreement). A requirement impossible to satisfy - simply because such synchronization is not possible to achieve (see article Probabilistic Clock Synchronization by F. Cristian if you are interested). The best we could do (along with mathematical proof!) was about 200ms - a small, but visible lag between the terminals. So we had everything - unrealistic expectations, changing requirements, incomplete requirements, and in the initial part of the project lack of customer input... The project was deployed successfully (with small delay), but the customer went out of business soon afterwards due to complete lack of marketing (and lack of budget as they have not done any budget predictions...).

Summarising: this post was again pretty theoretical. The next will be discussing a practical ability to write down requirements. We will also discuss some approaches to modelling, as it is often very useful in supporting requirements management (and many other activities as well).

Labels: , , ,

14 August 2011

Backup and version control

I said my next post will be about requirements specification, however I also said I will try to discuss issues I encountered. Here is one: HDD failure in my laptop. Obviously I have backups (3 to be precise: 1 on my main PC, 1 on external HDD, and one slightly older on DVDs). This is not my first HDD failure - I had quite a few already (yes, your HDDs will fail, the question is when). Obviously without the backups... (dont even want to think about that).

So we all understand the need for backups (I hope!), if you do not have your backup done today than do it now. An important part of my backup was an old SVN repository. A few weeks ago I had a discussion with my friend about the versioning systems so I thought "SVN makes me mad sometimes, so why not take a look at other solutions - I need to rebuild my workspace anyways, so I can swap SVN at the same time". And I have used BitKeeper before... so I know version control is not just a fancy way of doing backups.

Don't get me wrong SVN is not an evil - it just does not match the needs I have for research and experimental stuff I'm doing. SVN might be a better choice in some cases - it all depends on the needs, workflows, procedures etc.

After discarding all proprietary (and dead) systems only a few choices were left: Bazaar, Codeville, CVS, SVN, darkcs, git, GNU arch, mercurial and monotone. CVS and SVN were out of question as it is what I wanted to get rid of. GNU Arch is still maintained, but not extended anymore. The rest seems to be moreless the same on the paper (the same access protocols available, distributed architecture, actively in development etc). I decided to go for git - just because a lot of people is using it (and my friend told me I should;]).

I wanted distributed model as I work on several different equally important PCs, I share the stuff with a lot of equally important people, and I develop experimental things that I want to commit, but I do not want to propagate to other repositories. I also need easy branching (SVN gives me that) with easy merge (that is something SVN is not so good at in my opinion) - at least for my research projects as they are very *experimental* in their nature, and they require me to do a lot of branching (e.g. 3 versions of the same article: internal report, conference paper and thesis chapter). Git gives me a possibility to work on different PCs, having own repository on each of them and possibility to merge them when needed. A nice explanation of differences between centralized and distributed model of version control can be found in "Intro to Distributed Version Control (Illustrated)".

Instead of saying why I thought changing SVN to distributed model will be better, I will just paste a recording of a lecture Linus Torvalds gave at Google;] Some of the problems Linus presented are slightly outdated (video is from 2007) as SVN does things better now than it used to 4 years ago. It is still worth watching anyways.


The conversion of SVN repository to GIT was easy. Much easier than I thought it would: just checkout from SVN, removing all .svn folders, initializing git repository (on the first revision only, git add otherwise) and commit. Resolving problems with files disappearing from version-to-version was much easier than it was with SVN. It took me about 1 day to convert all my stuff (about 20Gb of resources related to various projects I did over last  few years). I decided to use rsync to backup my photos though - I think version control is not needed there as Lightroom has nice tools to have version control and non-destructive editing. I also decided to drop history of some of the projects so I just converted the last revision to git.

As there is a lot of tutorials and manuals on git, I will not even try to build a comprehensive guide here. Instead I will shortly show how to start. Lets assume I have a directory ~/WORKSPACE on my PC. Creating a repository takes just one command:
#~/WORKSPACE> git init-db

So eaaaasy. Now just add the files:
#~/WORKSPACE> git add .
#~/WORKSPACE> git commit

and my directory is already under version control.

Than on the second PC I just do:
#~/> git clone PROTOCOL://ADDR_TO_SECOND_PC/repository ./WORKSPACE

and I have the stuff on the second PC now. After doing some work I just do:
#~/WORKSPACE> git commit

And the stuff is commited. It is not updated on the first PC as we have distributed model!!! We can push repository there:
#~/WORKSPACE> git push origin

if you get error like:
Counting objects: 8, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 674 bytes, done.
Total 5 (delta 3), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error: 
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error: 
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To REPOSITORY/
 ! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to ' REPOSITORY/'

than on the first PC do:
#~/WORKSPACE> git config --bool core.bare true

The git push operation should work now. So far so good, but it is not where the true power is.

To create a patch just do:
#~/WORKSPACE> git diff HEAD^ HEAD > patch.diff

and to synchronize with origin (by downloading the things, rather than uploading the changes):
#~/WORKSPACE> git pull origin

What more can be done? All the stuff like add, remove, untrack, and of course branching! The true power however lies in the distribution.

I will not make a comprehensive guide here, instead will just give a couple of links:
Git FAQ at git.wiki.kernel.org
Tutorials on GitHub including one about how to ensure some files are ignored by git add
Git tutorial and git documentation on www.kernel.org.

Labels: , ,

13 August 2011

Why software fails?

In my previous post I described a project that was about to fail. In this post I will try to discuss what does it mean that software development is successfull and what are the resons (both immediate and these less apparent) of some spectacular software failures.

To start with lets ask a question: "what does it mean that development was successful?" Firstly, we have to consider purly economical criteria i.e. cost in terms of time and money. Simply speaking, we need the software to be completed within a prescribed timeframe and within an accepted budget. Both are equally important, as our customers do not want to pay too much and they also do not want to wait for years.

Secondly, the software delivered has to meet the needs of its users. This one seems to be obvious - if the software does not do what it needs to do, than it is obviously useless. However in many cases it is hard to tell what software has to do (we will be discussing this issue soon). For now just assume that software has to perform certain functions required by the users (so it has correct scope) and it has to execute them correctly (so the quality is right).

Lets set up a suitable background for further discussion by presenting a few spectacular examples of software failures. We will start with one that is often presented on various occasions - Ariane 5 explosion (or self-destruction to be precise) on June, 4th 1996. The immediate reason of the crash was a software error. The Failure Report (available here as PDF: Ariane 5 Flight 501 Failure, Report by the Inquiry Board) states: "A data conversion from 64-bit floating point value to 16-bit signed integer value to be stored in a variable representing horizontal bias caused a processor trap (operand error)". The conversion caused an exception, simply because the smaller datatype (16 bit) was not able to store values of the operand (which was 64-bit long). As the software was originally written for Ariane 4, the values were not checked and protected prior the conversion due to assumption that they are  "physically limited or that there was a large margin of error". The check was not implemented to ensure meeting a requirement of sufficiently low load on the main flight computer. The funny thing is, that the piece of code that caused the explosion was not needed by Ariane 5 at all - it was there because of reusing of whole Ariane 4 subsystem...
The result was spectacular fireworks show worth 350 000 000$ founded by French Arianespace.

Less funny was however a similar failure of MM-104 Patriot system in Dhahran on February, 25th 1991, when an Iraqi Scud missile killed 28 soldiers. Even though the approaching Scud was detected, the Patriot failed to intercept it due to a software error. Put simple, the radar detected the missile, the software predicted where it will be next... and it was not there. Why? The prediction procedure was OK, however it relied on an accurate calculation of time in the systems clock - and this part was buggy (cumulating roundoff error) causing clock drift (i.e. clock was not running at the correct speed). As the MM-104 station was up for over 100 hours, the clock was off about 1/3 of a second - enough for the Scud to travel over 600 meters. As the subsequent scans of the radar have not detected the incoming missile in the predicted areas, the system assumed it was a false alarm... while the Scud has hit the barracks already. Sadly the problem was identified a few days earlier, with a temporary solution suggested (i.e. rebooting the station every few hours) and the patch was delivered by the producent a few hours after the people died.

Another example of failure (but with no explosion this time) is FBI Virtual Case File. As the bureau was having troubles with sharing the data among its agents and divisions (and it was strongly criticised for that) it was decided to develop a system, where all the case files will be stored and all the agents with suitable access level will be able to quickly find them. The project started in 2001 just before 9/11 and was supposed to be finished in 2003. After several delays it was finally completed in 2004, but ... had only partial functionality, that did not match the requirements emerging after the terrorist attacks. 170 000 000$ spent on a project that was scrapped eventually.

Next example - Sainsbury supply chain management. Prior 2000 Salisbury had a centralised mainframe warehouse management. The new CEO claimed the server has had low utilisation, it used too many applications and outsourcing the IT departament to Accenture would lead to a new open, scalable, high performance architecture with strong security at low cost. In 2004 the contract with Accenture was renewed until 2010 as "the relationship with Accenture worked so well", however 2 month later CEO resigns because of poor financial performance. Soon after it is discovered that the warehouse management system under the development is not even able to track the stock correctly. The project was cancelled and the in-house IT departament restored. 1.8 billion $ wasted, Accenture blames Salisbury and vice versa.

I can probably give many more examples - but there are books full of them already (and really good article by Robert Charette titled "Why software fails" published in IEE Spectrum a few years ago). It is much more interesting to look at the reasons of these failures. Before we do that, lets consider them with relation to the criteria we set in the beginning:

ProjectDelivered on time?Within budget?Meets requirements?
AraineYESProbablyNo
MM-104ProbablyProbablyNo
FBI Case FileNoNoNo
Salisbury warehouse mgmt.NoNoNo

So as we can see the failures are not always the same - even if the final user gets a software on time and for the money he intended to pay, it does not mean it is a successful project (although in general your PC will not blow up when programs fail, so small number of bugs is usually acceptable given very high cost of identifying them prior release).

So what are the reasons this failures happen? In case of Araine, we might say the reason was insufficient testing. However, if we take a deeper look we will see that it is not the only problem: we can point out insufficient/erroneous specification of requirements - the software provided did something completely uncesessary (as it was developed for different spacecraft, with completely different launch procedure); moreover the requirements were contradicting or impossible to meet (low load vs. need for error detection and correction). Secondly, a blind software reuse was most probably caused by an urge to meet deadlines - this software was proven and tested (but on different spacecraft!), so it was easier to copy&paste (regardless of its suitability) than write it from scratch - somebody however forgot to ask if it can be reused directly without modification (or asked, but had no time to re-test properly).

Similarly, we might say an insufficient testing was the reason of the Patriot failure. But the bug was known at this time, and the immediate problem was lack of information how to deal with it - effectively an insufficient training of the soldiers managing the station - which was a command chain/management issue.

In case of the FBI Case File the reasons are quite obvious:
  • poorly defined, changing requirements, 
  • overcommitment to an ambitious project, that was (in fact) ment to be a fast fix to a broader management/communitaction problem within the FBI itself,
  • poor quality of work done by contractors without sufficient expertise.
Moreover there were some 'political reasons' contributing to the failure:
  • neglecting existance of off-the-shelf solutions,
  • 14 (sic!) project managers over the 3 years of project life time (so a manager changing something like every 2.5 months),
  • hardware setup, waiting for software (hence pressure for fast delivery on unrealistic deadlines)
Similar problems might be found in case of Salisbury:
  • weak outsourcing strategy,
  • 'big-bang' approach,
  • politics, 
  • software meant to be a fix for poor business management
In addition loss of staff with knowledge about exisiting solution contributed to some of the problems (Accenture claimed the software failed due to faulty Salisbury subsystems that were not outsourced to them).

I will also quickly list the problems with the project I wrote about last time. As I said, the first problem was lack of understanding of the customer needs - poor specification of requirements. Secondly, the project was approached in 'big-bang' fashion and without any suitable management policy. Well, there was lack of management at all. At the time I took over, the pressure to deliver fast was already growing, with an urge for scallable and efficient solution resulting with insufficient testing and poor overall quality. Thankfully we managed to sort it out - and the lesson was learnt.

We can generally say, that failure or success of the project depends critically on following factors:
  • organisational - being the culture and workflow within the organization  the software is developed for. In general the software should not be a fix to the problems of completely different nature (such as lack of suitable procedures, communication, poor business management);
  • project management - overcommitment and pressures of any kind do not help in efficient project management. Moreover lack of suitable management, broad overview and understanding of the project and its scope also contributes to failure. 
  • conduct of the project - errors in each of the phases of the project development - including initial stages (e.g. underestimating complexity, lure for a particular solution without a sensible reason), specification and design (e.g. poor requirements engineering, poor design, poor consulatation), and later in development (e.g. poor contractors quality, lack of suitable knowledge, communication etc.) and implementation (e.g. insufficient testing and users training) all contribute to a final result of the project.
From now on I will try to write more 'practical' and less theoretical posts that will address various issues presented here. I will try not to stick to 'natural' waterfall model (i.e. presentation of each of the phases one-by-one), but rather write about stuff I encounter everyday either in my company, or at the university (yes we also have to manage our research projects there). Hence, if I encouter an interesting issue - it is likely to be presented in an upcoming post (with some additional examples).

Next time I will be talking about requirements engineering or "what customer wants and why he does not really need it":)


Labels: , , ,

08 August 2011

What the programming is really about?

The way we learn to program causes in my opinion a small misunderstanding of what programming really is. All tutorials, most of university courses (at least these called 'programming'), and the way we approach the process of learning put focus on writing a code in a particular language. But the process of creating a piece of (usable) software is much more than just writing the code.

Whatever approach, workflow, or strategy we employ there are always some steps and parts that cannot be ommited such as design and testing. In the beginig of our carrier as a programmer we often do not do explicit design or testing steps - the programs we write (e.g. to complete programming course) are often small enough to do the design on-the-fly in our head, and refine in process, while the testing is often done in "lets check if it works" fashion. As such these steps are not completely ommited, however neglecting them this way causes our software to be only good enough to serve the particular purpose of the moment (i.e. passing the course). Extending, maintaining or even proper debugging is a pain even though the code is usually less than a few thousands lines.

But what with real software? Not toy programs of 1000 lines but real program that has let say 100 000 lines of the code, usees number of external libraries (additional 100 000 lines of code, not maintained by us directly), a number of resources like images, data files, devices (e.g. printers). A programs that are not single-threaded (often different parts do not even work on the same PC), and that compete for resources with other processes.

Extending and maintaining such an application was one of the first tasks when I started to work as a professional progammer. Unfortunately the code was written without any design, with virtually no documentation and not even trace of user requirements. To make things harder the programmer who wrote it left the company shortly before I started... The effect? Complete mess... It took me half a year to really understand how it works and fix it to the degree it could be used by the customers. 5 years, 2 releases, major rewrites, and a lot of cleanup later, the code was still simply bad, but there was hope to make it right!

Why this happened? There are several reasons most important are lack of initial requirements, understanding customer needs, lack of any design and testign routines. Plus a few missed deadlines, and a complicating factor of the customers changing their mind every second day. The code I inherited form the previous developer was simply a stack of ad-hoc solutions to the unstructured set of ideas that customers had.

To be honest I was part of the problem as well - being young and proud, I said "yes I can make it working". I was right I could - but today I know it could have been done at half of the cost and much faster if I started from scratch with the correct approach. Unfortunately, nobody prevented me from making this mistake back than as the company I was working at was not really a software development business. As such no project manager had any real experience in real-world software development.

But hey! Saying I was part of the problem I didn't mean I had no knowledge about software engineering. I had! I have taken the courses, I have read the books, and hell I even have written a few programs using the correct approach. The problem was that I had still belived that I can manage to do ad-hoc development on such a large program. Having experience only in writing programs that were 1/10 of its size, and experience in modifying large programs that were nicely written, correctly designed and well documented, I had no idea it might be that hard.

But was that really my fault? I had come across 3 universities in 3 different countries, in each case the programming courses I had to take, taught me to write a code. The engineering process was taught in separation and required me - a student - to write a small program using widely accepted approach (e.g. Rational Process). But the scale was not right, writing 1000-5000 lines of code to make a small database of customers and products makes no difference if I make some serious mistakes while building the program or not as I can recover quickly by deleting 100 lines of code and writing them again. Plus nobody told me how to estimate time and cost of the development. How are you going to earn money as a company if you have no idea what the cost will be and how much customer has to pay? Most of the graduates I interviewed over last few years for various positions in the companies I was working for were in the same position as I had been back then.

Fortunately there is more emphasis on design and testing in the current way of teaching, but I would say even more is needed. In the couple of next articles I will try to develop a small programming 101 series. I will not make a tutorial though - rather I will try to focus why we should not consider programming to be code writing (which is coding), how to avoid mistakes, and how to make ours (i.e. programmers) and our project managers life easier. I will not limit myself to write essays - instead I will be presenting various projects, techniques and examples I consider useful or because they are simply worth looking at.

Before I finish I will give you a link to a small article titled "Top 5 Project Failure Reasons, or Why My Project Fails" published on My Management Guide site. Even though I do not think it is comprehensive - it is a good starting point to understand why the project I described above was such a pain, and why it actually is a miracle it was completed to the customer satisfaction. 

Labels: , , , ,

07 August 2011

There's a new sheriff in the town...

The blog will have its second life, after Alexander - the previous author agreed to transfer it to me. I will try to keep it alive, writing about various technical and non-technical issues related to programming and computer science as a field. The first batch of stuff will be published soon - I have some larger projects going on in my company, that we need to pay close attention to... software projects management is a b**ch.

Labels: ,