A Software Developer's Perspective: 2009

Friday, November 27, 2009

Excel and grep is the logfile analyzers best friend

Lately I have been analyzing some logfiles for invocation times of remote services. Luckily the most problematic part of the system logs time consumption together with a remote system identifier and some other stuff.

Some years ago I was working on SmartLearn, implementing analytics for learning accountables. Through that work I got to know Excels Pivoting capabilities, but back then I used Microsoft Analysis Services for creating the Pivot tables. I have also been a user of Linux-, Unix- and Cygwin's sharp commandline tools for a long time. Seperately I know strength of the tools from both of these worlds, but I did not recognize how I could use them together.

One of my project peers showed me how Excel could be used to extract data from flat files and present it as Pivot tables in Excel with very few steps. The key premise is that the interesting dimensions of the Pivot-table is logged on the same lines as the interesting numbers/text. The technnique described here will let you visualize counting of things. Even without any numbers the frequence of things can be very interesting. Most system log files contains a timestamp, and this can be combined with almost anything *. Using *nix-commandline tools it is of course possible to extract whatever information you like from flat files.

17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401

To extract only these lines I use Cygwin's grep command like this:

grep Task.*duration <logfilename> > extractedlog.txt

From Excel open the extracted log file. Excel will recognize the file as a textfile that you might want to split in columns. Choose between splitting at fixed positions and characters. Spaces can be a viable option in some cases.

When you have imported the file it may be necessary to split columns manually using the Text To Columns tool in Data toolbar.

Now you must insert a row at the top of the dataset, and add header names of the interesting columns. Cut & paste the interesting columns so they are adjacent to each other.

Now select the Insert tool in Excel, and Pivot Table (leftmost button in my installtion of Excel 2007 under Insert)

Choose either Table or Chart, and select the interesting fact and dimension columns from the spreadsheat. When you click OK, you can start dragging & dropping columns into Axes, Values and Legend containers on the right side.

You can now twist the logdata as you want, and find relations between them you would spend enormous amount of time finding manually in the log file. Some simple examples:

To create the dataset I used this Java code that produces differentiated exection trends in a number of threads (which is not meant as a educational example of how to write multithreaded code):

package com.webstep.logfilegenerator;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class LogGenerator {

/**
* @param args
*/
public static void main(String[] args) {
  ExecutorService exSvc =  Executors.newFixedThreadPool(3);

  List<Future> futures = new ArrayList<Future>();
  final Random r = new Random();
  for (int i = 0; i < 10; i++) {
      futures.add(exSvc.submit(new Task1("Task"+i, r.nextInt(i+1*500))));
  }  
  try {
      for (Future f : futures) {
          System.out.println(f.get(10000, TimeUnit.MILLISECONDS));
      }

  } catch (InterruptedException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
  } catch (ExecutionException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
  } catch (TimeoutException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
  } finally {
      exSvc.shutdownNow();
  }
}
}

import java.util.Random;
import java.util.concurrent.Callable;

import org.apache.log4j.Logger;


public class Task1 implements Callable {
private static Logger log = Logger.getLogger(Task1.class.getName());
private final static Random r = new Random();
private final int maxTimeout;
private final String name;

public Task1(final String name, final int maxTimeout) {
  super();  
  this.maxTimeout = maxTimeout;
  this.name = name;
  log.debug(name + " maxtimeout=" + maxTimeout);
}


public Object call() throws Exception {
  while (true) {
      log.debug("start call " + name);

      long start = System.currentTimeMillis();
      Thread.sleep(r.nextInt(maxTimeout));
      log.debug("end call, " + name + " duration=" + (System.currentTimeMillis() - start));
  }
    
}
}

Some sample output from this code:

17:47:56,798 DEBUG Task1:19 - Task0 maxtimeout=98
17:47:56,803 DEBUG Task1:19 - Task1 maxtimeout=12
17:47:56,803 DEBUG Task1:25 - start call Task0
17:47:56,804 DEBUG Task1:19 - Task2 maxtimeout=17
17:47:56,804 DEBUG Task1:25 - start call Task1
17:47:56,806 DEBUG Task1:19 - Task3 maxtimeout=51
17:47:56,807 DEBUG Task1:19 - Task4 maxtimeout=92
17:47:56,807 DEBUG Task1:19 - Task5 maxtimeout=47
17:47:56,808 DEBUG Task1:19 - Task6 maxtimeout=8
17:47:56,809 DEBUG Task1:19 - Task7 maxtimeout=87
17:47:56,809 DEBUG Task1:19 - Task8 maxtimeout=66
17:47:56,810 DEBUG Task1:19 - Task9 maxtimeout=39
17:47:56,808 DEBUG Task1:25 - start call Task2
17:47:56,876 DEBUG Task1:29 - end call, Task2 duration=65
17:47:56,878 DEBUG Task1:25 - start call Task2
17:47:57,030 DEBUG Task1:29 - end call, Task2 duration=151
17:47:57,031 DEBUG Task1:25 - start call Task2
17:47:57,174 DEBUG Task1:29 - end call, Task0 duration=369
17:47:57,175 DEBUG Task1:25 - start call Task0
17:47:57,296 DEBUG Task1:29 - end call, Task1 duration=490
17:47:57,297 DEBUG Task1:25 - start call Task1
17:47:57,323 DEBUG Task1:29 - end call, Task0 duration=146
17:47:57,324 DEBUG Task1:25 - start call Task0
17:47:57,390 DEBUG Task1:29 - end call, Task1 duration=91
17:47:57,391 DEBUG Task1:25 - start call Task1
17:47:57,448 DEBUG Task1:29 - end call, Task2 duration=415
17:47:57,449 DEBUG Task1:25 - start call Task2
17:47:57,549 DEBUG Task1:29 - end call, Task1 duration=157
17:47:57,550 DEBUG Task1:25 - start call Task1
17:47:57,729 DEBUG Task1:29 - end call, Task2 duration=278
17:47:57,730 DEBUG Task1:25 - start call Task2
17:47:57,802 DEBUG Task1:29 - end call, Task0 duration=477
17:47:57,803 DEBUG Task1:25 - start call Task0
17:47:57,886 DEBUG Task1:29 - end call, Task1 duration=334
17:47:57,887 DEBUG Task1:25 - start call Task1
17:47:58,048 DEBUG Task1:29 - end call, Task2 duration=316
17:47:58,048 DEBUG Task1:25 - start call Task2
17:47:58,068 DEBUG Task1:29 - end call, Task0 duration=264
17:47:58,069 DEBUG Task1:25 - start call Task0
17:47:58,087 DEBUG Task1:29 - end call, Task1 duration=199
17:47:58,092 DEBUG Task1:25 - start call Task1
17:47:58,098 DEBUG Task1:29 - end call, Task1 duration=5
17:47:58,098 DEBUG Task1:25 - start call Task1
17:47:58,099 DEBUG Task1:29 - end call, Task2 duration=48
17:47:58,099 DEBUG Task1:25 - start call Task2
17:47:58,115 DEBUG Task1:29 - end call, Task1 duration=17
17:47:58,115 DEBUG Task1:25 - start call Task1
17:47:58,245 DEBUG Task1:29 - end call, Task2 duration=146
17:47:58,245 DEBUG Task1:25 - start call Task2
17:47:58,339 DEBUG Task1:29 - end call, Task0 duration=267
17:47:58,340 DEBUG Task1:25 - start call Task0
17:47:58,398 DEBUG Task1:29 - end call, Task0 duration=57
17:47:58,399 DEBUG Task1:25 - start call Task0
17:47:58,412 DEBUG Task1:29 - end call, Task0 duration=12
17:47:58,412 DEBUG Task1:25 - start call Task0
17:47:58,542 DEBUG Task1:29 - end call, Task1 duration=427
17:47:58,543 DEBUG Task1:25 - start call Task1
17:47:58,573 DEBUG Task1:29 - end call, Task2 duration=326
17:47:58,574 DEBUG Task1:25 - start call Task2
17:47:58,741 DEBUG Task1:29 - end call, Task1 duration=197
17:47:58,742 DEBUG Task1:25 - start call Task1
17:47:58,783 DEBUG Task1:29 - end call, Task0 duration=369
17:47:58,784 DEBUG Task1:25 - start call Task0
17:47:58,835 DEBUG Task1:29 - end call, Task2 duration=260
17:47:58,835 DEBUG Task1:25 - start call Task2
17:47:58,873 DEBUG Task1:29 - end call, Task0 duration=89
17:47:58,873 DEBUG Task1:25 - start call Task0
17:47:58,923 DEBUG Task1:29 - end call, Task2 duration=87
17:47:58,923 DEBUG Task1:25 - start call Task2
17:47:58,942 DEBUG Task1:29 - end call, Task0 duration=67
17:47:58,942 DEBUG Task1:25 - start call Task0
17:47:59,084 DEBUG Task1:29 - end call, Task1 duration=342
17:47:59,084 DEBUG Task1:25 - start call Task1
17:47:59,091 DEBUG Task1:29 - end call, Task2 duration=166
17:47:59,091 DEBUG Task1:25 - start call Task2
17:47:59,115 DEBUG Task1:29 - end call, Task0 duration=172
17:47:59,115 DEBUG Task1:25 - start call Task0
17:47:59,348 DEBUG Task1:29 - end call, Task2 duration=257
17:47:59,348 DEBUG Task1:25 - start call Task2
17:47:59,473 DEBUG Task1:29 - end call, Task2 duration=124
17:47:59,474 DEBUG Task1:25 - start call Task2
17:47:59,505 DEBUG Task1:29 - end call, Task0 duration=389
17:47:59,505 DEBUG Task1:25 - start call Task0
17:47:59,506 DEBUG Task1:29 - end call, Task2 duration=31
17:47:59,507 DEBUG Task1:25 - start call Task2
17:47:59,534 DEBUG Task1:29 - end call, Task1 duration=450
17:47:59,535 DEBUG Task1:25 - start call Task1
17:47:59,579 DEBUG Task1:29 - end call, Task1 duration=44
17:47:59,580 DEBUG Task1:25 - start call Task1
17:47:59,629 DEBUG Task1:29 - end call, Task2 duration=121
17:47:59,630 DEBUG Task1:25 - start call Task2
17:47:59,667 DEBUG Task1:29 - end call, Task0 duration=160
17:47:59,668 DEBUG Task1:25 - start call Task0
17:47:59,747 DEBUG Task1:29 - end call, Task1 duration=167
17:47:59,748 DEBUG Task1:25 - start call Task1
17:47:59,852 DEBUG Task1:29 - end call, Task2 duration=221
17:47:59,852 DEBUG Task1:25 - start call Task2
17:48:00,153 DEBUG Task1:29 - end call, Task0 duration=485
17:48:00,154 DEBUG Task1:25 - start call Task0
17:48:00,156 DEBUG Task1:29 - end call, Task1 duration=408
17:48:00,156 DEBUG Task1:25 - start call Task1
17:48:00,185 DEBUG Task1:29 - end call, Task2 duration=332
17:48:00,185 DEBUG Task1:25 - start call Task2
17:48:00,316 DEBUG Task1:29 - end call, Task2 duration=130
17:48:00,317 DEBUG Task1:25 - start call Task2
17:48:00,435 DEBUG Task1:29 - end call, Task2 duration=117
17:48:00,436 DEBUG Task1:25 - start call Task2
17:48:00,542 DEBUG Task1:29 - end call, Task1 duration=385
17:48:00,543 DEBUG Task1:25 - start call Task1
17:48:00,629 DEBUG Task1:29 - end call, Task0 duration=474
17:48:00,631 DEBUG Task1:25 - start call Task0
17:48:00,720 DEBUG Task1:29 - end call, Task2 duration=283
17:48:00,721 DEBUG Task1:25 - start call Task2
17:48:00,809 DEBUG Task1:29 - end call, Task1 duration=266
17:48:00,810 DEBUG Task1:25 - start call Task1
17:48:00,886 DEBUG Task1:29 - end call, Task1 duration=74
17:48:00,887 DEBUG Task1:25 - start call Task1
17:48:00,908 DEBUG Task1:29 - end call, Task0 duration=276
17:48:00,909 DEBUG Task1:25 - start call Task0
17:48:01,023 DEBUG Task1:29 - end call, Task2 duration=302
17:48:01,023 DEBUG Task1:25 - start call Task2
17:48:01,140 DEBUG Task1:29 - end call, Task0 duration=230
17:48:01,140 DEBUG Task1:25 - start call Task0
17:48:01,349 DEBUG Task1:29 - end call, Task1 duration=462
17:48:01,350 DEBUG Task1:25 - start call Task1
17:48:01,378 DEBUG Task1:29 - end call, Task2 duration=355
17:48:01,378 DEBUG Task1:25 - start call Task2
17:48:01,518 DEBUG Task1:29 - end call, Task2 duration=139
17:48:01,518 DEBUG Task1:25 - start call Task2
17:48:01,599 DEBUG Task1:29 - end call, Task1 duration=249
17:48:01,599 DEBUG Task1:25 - start call Task1
17:48:01,603 DEBUG Task1:29 - end call, Task0 duration=461
17:48:01,603 DEBUG Task1:25 - start call Task0
17:48:01,939 DEBUG Task1:29 - end call, Task0 duration=335
17:48:01,940 DEBUG Task1:25 - start call Task0
17:48:01,997 DEBUG Task1:29 - end call, Task2 duration=476
17:48:01,998 DEBUG Task1:25 - start call Task2
17:48:02,014 DEBUG Task1:29 - end call, Task1 duration=414
17:48:02,015 DEBUG Task1:25 - start call Task1
17:48:02,146 DEBUG Task1:29 - end call, Task1 duration=131
17:48:02,147 DEBUG Task1:25 - start call Task1
17:48:02,261 DEBUG Task1:29 - end call, Task0 duration=320
17:48:02,262 DEBUG Task1:25 - start call Task0
17:48:02,343 DEBUG Task1:29 - end call, Task2 duration=342
17:48:02,343 DEBUG Task1:25 - start call Task2
17:48:02,378 DEBUG Task1:29 - end call, Task1 duration=230
17:48:02,378 DEBUG Task1:25 - start call Task1
17:48:02,520 DEBUG Task1:29 - end call, Task0 duration=257
17:48:02,520 DEBUG Task1:25 - start call Task0
17:48:02,597 DEBUG Task1:29 - end call, Task2 duration=253
17:48:02,598 DEBUG Task1:25 - start call Task2
17:48:02,734 DEBUG Task1:29 - end call, Task1 duration=355
17:48:02,735 DEBUG Task1:25 - start call Task1
17:48:02,786 DEBUG Task1:29 - end call, Task0 duration=265
17:48:02,787 DEBUG Task1:25 - start call Task0
17:48:02,959 DEBUG Task1:29 - end call, Task2 duration=361
17:48:02,959 DEBUG Task1:25 - start call Task2
17:48:03,017 DEBUG Task1:29 - end call, Task0 duration=228
17:48:03,018 DEBUG Task1:25 - start call Task0
17:48:03,027 DEBUG Task1:29 - end call, Task2 duration=66
17:48:03,030 DEBUG Task1:25 - start call Task2
17:48:03,214 DEBUG Task1:29 - end call, Task1 duration=478
17:48:03,215 DEBUG Task1:25 - start call Task1
17:48:03,299 DEBUG Task1:29 - end call, Task2 duration=268
17:48:03,300 DEBUG Task1:25 - start call Task2
17:48:03,305 DEBUG Task1:29 - end call, Task1 duration=90
17:48:03,306 DEBUG Task1:25 - start call Task1
17:48:03,324 DEBUG Task1:29 - end call, Task1 duration=17
17:48:03,325 DEBUG Task1:25 - start call Task1
17:48:03,345 DEBUG Task1:29 - end call, Task2 duration=44
17:48:03,345 DEBUG Task1:25 - start call Task2
17:48:03,456 DEBUG Task1:29 - end call, Task1 duration=130
17:48:03,456 DEBUG Task1:25 - start call Task1
17:48:03,484 DEBUG Task1:29 - end call, Task2 duration=138
17:48:03,484 DEBUG Task1:25 - start call Task2
17:48:03,491 DEBUG Task1:29 - end call, Task2 duration=6
17:48:03,492 DEBUG Task1:25 - start call Task2
17:48:03,496 DEBUG Task1:29 - end call, Task0 duration=477
17:48:03,496 DEBUG Task1:25 - start call Task0
17:48:03,696 DEBUG Task1:29 - end call, Task0 duration=199
17:48:03,696 DEBUG Task1:25 - start call Task0
17:48:03,796 DEBUG Task1:29 - end call, Task1 duration=340
17:48:03,809 DEBUG Task1:25 - start call Task1
17:48:03,841 DEBUG Task1:29 - end call, Task2 duration=348
17:48:03,841 DEBUG Task1:25 - start call Task2
17:48:03,845 DEBUG Task1:29 - end call, Task1 duration=35
17:48:03,847 DEBUG Task1:25 - start call Task1
17:48:03,876 DEBUG Task1:29 - end call, Task0 duration=179
17:48:03,876 DEBUG Task1:25 - start call Task0
17:48:03,952 DEBUG Task1:29 - end call, Task2 duration=111
17:48:03,952 DEBUG Task1:25 - start call Task2
17:48:03,999 DEBUG Task1:29 - end call, Task0 duration=123
17:48:04,000 DEBUG Task1:25 - start call Task0
17:48:04,220 DEBUG Task1:29 - end call, Task1 duration=372
17:48:04,221 DEBUG Task1:25 - start call Task1
17:48:04,348 DEBUG Task1:29 - end call, Task2 duration=395
17:48:04,349 DEBUG Task1:25 - start call Task2
17:48:04,385 DEBUG Task1:29 - end call, Task0 duration=384
17:48:04,392 DEBUG Task1:25 - start call Task0
17:48:04,633 DEBUG Task1:29 - end call, Task2 duration=284
17:48:04,634 DEBUG Task1:25 - start call Task2
17:48:04,678 DEBUG Task1:29 - end call, Task1 duration=457
17:48:04,679 DEBUG Task1:25 - start call Task1
17:48:04,704 DEBUG Task1:29 - end call, Task1 duration=25
17:48:04,704 DEBUG Task1:25 - start call Task1
17:48:04,766 DEBUG Task1:29 - end call, Task0 duration=373
17:48:04,766 DEBUG Task1:25 - start call Task0
17:48:04,912 DEBUG Task1:29 - end call, Task2 duration=277
17:48:04,912 DEBUG Task1:25 - start call Task2
17:48:04,928 DEBUG Task1:29 - end call, Task2 duration=15
17:48:04,928 DEBUG Task1:25 - start call Task2
17:48:04,943 DEBUG Task1:29 - end call, Task1 duration=238
17:48:04,944 DEBUG Task1:25 - start call Task1
17:48:05,072 DEBUG Task1:29 - end call, Task2 duration=143
17:48:05,072 DEBUG Task1:25 - start call Task2
17:48:05,132 DEBUG Task1:29 - end call, Task2 duration=59
17:48:05,132 DEBUG Task1:25 - start call Task2
17:48:05,135 DEBUG Task1:29 - end call, Task1 duration=191
17:48:05,135 DEBUG Task1:25 - start call Task1
17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401
17:48:05,168 DEBUG Task1:25 - start call Task0
17:48:05,239 DEBUG Task1:29 - end call, Task1 duration=103
17:48:05,240 DEBUG Task1:25 - start call Task1
17:48:05,357 DEBUG Task1:29 - end call, Task2 duration=224
17:48:05,357 DEBUG Task1:25 - start call Task2
17:48:05,526 DEBUG Task1:29 - end call, Task1 duration=286
17:48:05,526 DEBUG Task1:25 - start call Task1
17:48:05,555 DEBUG Task1:29 - end call, Task2 duration=197
17:48:05,555 DEBUG Task1:25 - start call Task2
17:48:05,600 DEBUG Task1:29 - end call, Task2 duration=44
17:48:05,600 DEBUG Task1:25 - start call Task2
17:48:05,616 DEBUG Task1:29 - end call, Task0 duration=447
17:48:05,616 DEBUG Task1:25 - start call Task0
17:48:05,650 DEBUG Task1:29 - end call, Task2 duration=49
17:48:05,652 DEBUG Task1:25 - start call Task2
17:48:05,772 DEBUG Task1:29 - end call, Task2 duration=120
17:48:05,773 DEBUG Task1:25 - start call Task2
17:48:05,775 DEBUG Task1:29 - end call, Task1 duration=248
17:48:05,775 DEBUG Task1:25 - start call Task1
17:48:05,782 DEBUG Task1:29 - end call, Task2 duration=9
17:48:05,782 DEBUG Task1:25 - start call Task2
17:48:05,865 DEBUG Task1:29 - end call, Task2 duration=83
17:48:05,866 DEBUG Task1:25 - start call Task2
17:48:05,870 DEBUG Task1:29 - end call, Task2 duration=4
17:48:05,870 DEBUG Task1:25 - start call Task2
17:48:05,878 DEBUG Task1:29 - end call, Task0 duration=261
17:48:05,879 DEBUG Task1:25 - start call Task0
17:48:06,105 DEBUG Task1:29 - end call, Task2 duration=234
17:48:06,105 DEBUG Task1:25 - start call Task2
17:48:06,239 DEBUG Task1:29 - end call, Task0 duration=360
17:48:06,239 DEBUG Task1:25 - start call Task0
17:48:06,258 DEBUG Task1:29 - end call, Task2 duration=152
17:48:06,258 DEBUG Task1:25 - start call Task2
17:48:06,260 DEBUG Task1:29 - end call, Task1 duration=485
17:48:06,260 DEBUG Task1:25 - start call Task1
17:48:06,452 DEBUG Task1:29 - end call, Task0 duration=213
17:48:06,452 DEBUG Task1:25 - start call Task0
17:48:06,586 DEBUG Task1:29 - end call, Task1 duration=326
17:48:06,586 DEBUG Task1:25 - start call Task1
17:48:06,619 DEBUG Task1:29 - end call, Task0 duration=166
17:48:06,620 DEBUG Task1:25 - start call Task0
17:48:06,672 DEBUG Task1:29 - end call, Task0 duration=52
17:48:06,672 DEBUG Task1:25 - start call Task0
17:48:06,747 DEBUG Task1:29 - end call, Task2 duration=489
17:48:06,747 DEBUG Task1:25 - start call Task2
17:48:06,792 DEBUG Task1:29 - end call, Task1 duration=205
17:48:06,792 DEBUG Task1:25 - start call Task1

* Updated 2009-11-29

Saturday, October 31, 2009

Open Source will never get out of stock

It is like running out of light or wind. Software can be copied as many times as needed. Why haven't Microsoft and Apple understood this yet?

Some random examples of Windows 7 beeing out of stock
http://crave.cnet.co.uk/software/0,39029471,49303067,00.htm
http://www.digi.no/827047/windows-7-utsolgt-i-sverige

How can software be sold out? The only commodity invlolved is the software to be copied. To produce a physical copy a USB drive or recordable CD or DVD must be used in the process. But basically software can be copied between computers over the network. The required production equipment is computers, electric power and a network. These resources are abundant in the enterprise and in the average european/american home.

It seems like it is mostly pirates and the Open Source movement that have understood this distribution model. Commercial vendors seems to try to ignore or deny it. That is too bad, because they could profit from the extremely low distribution costs.

Pirates exploits the low distribution costs to distribute goods illegally
Open Source have relied on this distribution model for a long time. This has resulted in a very broad open source software usage, and distribution costs is approximately zero.

While Open Source and pirated digitial goods has very different usage profiles, they share the common properties of extremely effective and almost non-existing distribution costs. Open Source is mainly used by software developers, and consumers download software and digital media content for free.

In the last couple of years a new distribution patterns has gained foothold, beyond the most simple forms: SaaS. SaaS leverages software that can be used as-is instantly. Since there is no significant download and installation, the costs/effort required from the end user is even lower than with Open Source. For example the latest release of Ubuntu aims to provide a OS as a service by leveraging Amazon EC2 compatible images.

Software vendors, and other digital content providers, should free themselves from physical medias. Soon physical medias will be of interest of the entusiast. The rest of us will prefer instant gratification over the physical media.

Last weekend I was in a record shop where they of course played some music. I was very surprised when I saw the PC with Spotify was used instead of a CD player. That says it all about the state of physical distribution of digital goods.

Wednesday, October 21, 2009

Monocultures are unhealthy - even in software

Making the assumption that every project should be based on the same software stack, is just another variation over Silver Bullet.

In the software world the temptation to default to an architeture that has worked before is unhealthy. The result is almost always a constant struggle for the project to overcome limitations and find workarounds for architetural monstrosities.

Monocultures fosters very little learning in the organisation and leads to forcing inappropiate solutions to problems. According to epidemology theories monocultures is very vulnerable and is unable to evolve to tackle changing environment. Thinking of epidemology naturally leads to thinking about security issues as well. Monocultures is only able to resist specific types of threats, and given that the threats is certainly evolving at a blazing speed there is a obvious need to have variation.

In the longer term Monoculture will hinder innovation, and this can disastrous for an organization whose business model is e.g. developing and selling software.

But of course not all variations will be good, and should be dismissed. When experimenting with a technology new to the organization or a given project, do it in the small before going full scale. The opposite of genetical monoculture is diversity. Healthy software architecture in an organisation is probably best grown in an evolutionary way, allowing varations, promote the things that work well (but not restrict to only that). More important: things that do not work well must die.

All systems should be architected with the "right set" of technologies for the problem it is suppose to solve. One should start with what you know for sure, and make as few assumptions about the future as possible. The Cantara Software foundation has a wiki discussing these issues among other things, and I will try to post more in this topic there.

Yesterday I came across a blog post in Cutter Consortium, about uncertainty in a leadership perspective. This applies to software architects too. This should be in the architect's mind when evolving the software arcitecture for the organization.

Wednesday, September 9, 2009

DDD and Ubiquitous language

A recurring subject on several sessions on the 1st day of Javazone 09 was Domain Driven Design and Ubiquitous Language. I do agree with those believes it is important to establish a language that can be used by domain experts and developers to ensure consistency.

Just before I had to leave I attended the DDD panel discussion where it was discussed whether this language should be used in the code as well. Well that is completely natural for English speaking developers. For developers in countries where English is not a native language, this raises some questions:

The code can not be maintained by people not familiar with the language the code is written in. This is very relevant in outsourcing.
It will pose problems when companies from different countries merges and must update the code and integrate to reflect the new situation.
Code that will or can be exposed as open source or in company partnerships will be of less value if it is written in non English

My personal opinion is that Norwegian (my native language) does not look well in code. It feels unnatural.

Is it responsible to write code in a non English language? I think this should be discussed with the stakeholders. On the other hand when there is a language impedance, how do you deal with it? Is it the developers responsibility?

I wanted to ask the panel these questions, but had to leave to catch a bus.

Updated: Corrected Domain Driven Development to Domain Driven design and added useful links.

Wednesday, August 26, 2009

Do you master social media?

This post is almost "off topic", and does not follow up on the series I announced in my previous post. Think I've forgot what I was thinking of then, but I'll surely get back to software and architecture later.

The background for this post is that I can not help myself getting frustrated over uninformed criticism over social media that a lot of old hat media seems to love these days.

Social media is, by many, viewed as a tornado of (useless/disconnected) information. On the other hand it is easy to find examples of constructive use of social media, that provides extremely useful information and knowledge to it's readers. For some examples look at my blogger following on the side of my blog. Sometimes I get irritated when people has strong negative opinions against social media (especially microblogging), stating it makes us dumber and unable to concentrate anymore. Well that is up to the individual to decide what to make of it, and decide to pick up a book instead of e.g. "hypertweeting".

I have stumbled upon extremely useful knowledge on Twitter, blogs and more professional web publishers, reading and discussing it through social media's unique features. My opinion is that social media is a whole lot better way of staying informed and increasing your knowledge than watching TV. Well at least a lot of what is distributed via TV is not exactly what tickles your brain to think.

To be really interesting in the social media space you will have to consume considerable amount of information, and possess the capability to transform and magnify this information to your readers. This requires deep concentration and is exactly how it always has been. Just think of ancient Greek philosophers. Some have large information processing capability and others not. Given that some has better capabilities than others, everyone can improve their skills. Social media provides the best training ground ever for improving the skill to express your knowledge. As Chris Anderson says in The Long Tail, the tools for expressing yourself in public writing has been democratized.

In my view expressing and discussing your knowledge is probably the most valuable thing a person can do for oneself and society in general. The ability to express your knowledge through writing is more important than ever. I am a strong believer of knowledge is key to improve individuals and mankind. Each and everyone should acquire the knowledge they need to make informed decisions for themselves or in a position as a decision maker for others such as leaders and politicians. With social media it is even possible to discuss things on a completely democratic way before things is realized.

Saturday, June 13, 2009

Twitter - an untested spacecraft

Oh no, not a another post analyzing the growth of Twitter I hear you say. But here I will only use it to exemplify how software architects can and should prepare for rapid growth of services that are or will become an integral part of the social Web.

This post is first in a series of posts I am currently thinking of, on Software Architecture for the Social Web. Througout the post I will use Service Categorization defined at Cantara Wiki.

It will be irresponsible to implement a software system with all necessary investments for future scaling to large amounts of traffic. It is equally irresponsible to not architect it to let it grow so that it can handle large loads.

A software system "exposed" to the Social Web can have millions of potential users. That will apply to all applications that everyone can subscribe to. With the possibilities of OpenId and Facebook identity provisioning a subscription is not many clicks away. Who, when and how many will try the service depends on it's attractiveness and word-of-mouth on the web.

Twitter is exceptional in several ways: It as had an enormous growth, and has a lot of growing pains. It is unusual for a H2A service that have been that unreliable to sustain such a remarkable growth. Launching Twitter must have been like launching a space craft that had not been tested for the stress it has been exposed to. Astronauts and ground crew has only 1 chance of success: there will be no repair or improvements after launch. Luckily software can be repaired after launch, but it cannot be assumed it will be easy and with little impact on it's users without planning for it.

I don't know exactly how Twitter's architecture have evolved from it's initial release, but there must be at least 3 key takeaways for architects (seen from the outside):

It has always been easy to integrate with Twitter through it's API (A2A services), leveraging valueadded H2A services and clients
It has improved to handle the current load without breaking the API and the clients.
Except from a specific feature that was removed (causing an uproar), no functionality has been broken as I know of

These points is important to all software architects, but on the social web failing to fulfill them will certainly be a disaster. I guess not all services will experience such faithful users as Twitter, that has kept coming back despite instability at times. Twitter is probably highly valued among a lot of users, is free to use and can not be held economical- or legally responsible for any loss of data or unavailabilty.

When architecting a service that have to fulfill a SLA, or is paid for, some level of stability will be assumed. Failing to do that might have economical- , legal- and market implications. This applies to all services, and especially when offering commercial services exposed to the social web, some key architecture aspects most be thorughly planned. To number a few important ones:

All services (that is all types described on Cantara Wiki, but especially H2A and A2A) has to be orthogonal. This ensures least painful replacements when they become bottlenecks or otherwise insufficent.
Clean interfaces and clearly defined responsibilites at all levels of the architecture.
Conscious use of tradeoffs to deliver early but still keeping the architecture agile to let it evolve.
Measuring of performance and resource usage must be built-in to identify bottlenecks early.

These things will be essential for all projects aiming at the social web as its user base. It will be hard to predict adoption rates, and it will seldom be economically feasable to make it extremely scaleable before launch. Most projects must release early to start the revenue stream and get feedback from users on what to implement next.

Saturday, May 16, 2009

What can Open do for you?

Yesterday Totto noticed me of this Gartner blog about how CIO's should answer when asked about what IT contributes to the enterprise. Mark McDonald suggest that the answer should be how IT contributes to the core business model. This way it can be shed some light on how business models and IT can both be improved to be better positioned against the competition.

Another blog post that caught my interest yesterday was Matt Asay's
Cloud computing: A natural conclusion of open source? Here Asay is writing about Tim O'Reilly's prediction that Cloud Computing will make Open Source licenses irrelevant. A bit far from what my post is about, but it stresses that new software will be consuming/provider of services. In order for new software based on the SOA/Cloud Computing paradigms, open data formats and open standards will be quite important. Open Source is a natural way of implementing both data formats and standards, and hopefully this is going to happen. Even inside enterprises these thoughts should be considered, so it will be possible to combine the result of different development efforts to create even more valuable software to support the core business model.

2 days ago, I fell in love with this blogpost from Rickard Öberg about quickness. Although this post is about quickness at the individual developer and team level, it applies to organizations as well. To quickly reposition against the competition and market demand, sharp services exposing open data formats and beeing based on open standards will be crucial.

The CIO should understand what Open can do for the core business model, as this will be utterly important as more software is SOA based, running in the cloud. The ability to move quickly has been important at all times, but in some ways the changes in paradigms is accelerating and it will be more visible when you does not posess this ability.

Wednesday, March 25, 2009

Yesterday, Now and Tomorrow

Designing software is a continuous task, requiring enduring effort and attention.
I strongly believe that the Code is the design. All efforts leading to the delivered code is just preparations for creating the design that gets released.

During the lifetime of a software product better and less good decisions are made along the way. Some decisions are made consciously and others more accidentally. After reading and discussing Technical Debt with my colleagues lately, I have come to think that is other economical terms that also fits on design decisions like investing. Investments can pay off, not paying off and not loosing, and loosing. What you do is make a bet. The greater the investment, the more certain you should be that is the right thing to do.

Ward Cunningham coined the Technical debt phrase. Martin Fowler and Steve McConnell has elaborated on it. This post tries to tie the investing term into software design. I really hope that by using these terms it is possible to create a software design meta language that is better understood by non technical decision makers.

Yesterday's decisions
Bad design from the past create technical debt. This debt should be downpayed (not just paying interests). It has to be a prioritized objective in every project to resolve this into better solutions.

Now
Sometimes there is a need to do something to the design just get the next release out the door. These decisions typically creates debt as a rule of thumb. Make these investments as small as possible. Update: Got a useful comment from @javatotto, about that the effect of these kind of design decisions should be isolated, so as few dependencies to them are created from elsewhere. Great comment that could be topic for the next post ;-)

Tomorrow
Investing for tomorrow is speculating. This is up front design, and often considered bad. This is especially true for projects that need the be able to change direction fast. It also requires an effort not helping you get the next release out of the door. You might even invest in something you will not need, which will represent a complete loss.

Small investments seldom make big trouble and can not represent big losses.

Big investments for tomorrow needs to be watched closely as they can potentially ruin the project. Maybe the worst thing about big investments are that they are extremely hard to dispose when they prove to be a loss. There is a cognitive attachment (ownership) that hinders replacing them with better solutions. The processes around big investments should be open, so that all people having a interests in it can have influence on how it develops. As I am a eager proponent for Open Innovation, I think especially these kind of investments can profit from developed as open source or at least open as possible.

Not paying down on the existing debt will ultimately be the terminator for a project. Final. I have seen some correlation between big investments and not paying down the technical debt in most projects I have participated.
Big investments for the future has potential to steal energy and focus (the projects short term capital) from things that should be fixed. Often it is the most talented developers involved in both activities, and they seldom manage to to both simultaneously. The worst case is when doing the wrong investment. That makes this even worse, as the debt grows dangerously fast.

Saturday, February 21, 2009

My Web 2.0 stack

I have earlier blogged about how Firefox is becoming my OS. Since then I've employed several Web 2.0 services, and Firefox is again a key tool for getting everything running smoothly. While some of the services will work in the exact same way with other browser it is the complete experience I am talking about here.

First there is plugins for sharing links. Since last post on this subject I've added Feedly.com, Friendfeed.com, Flickr.com, Youtube.com, Digg.com and Facebook.com (did not use it for sharing that way before) to the list. To make sharing as easy as possible the Shareoholic plugin comes in handy. This plugin can share links on almost every service that exist.

For now Twitterfox had to go, as I've started to use Tweetdeck, and if they run in parallell my Twitter API limit is reached pretty fast.

It has been a while since I started using Feedly, but it is only recently I have discovered what a great service it really is. Together with Google Reader, Feedburner and Friendfeed, it just makes everything very very easy and smoothly. Feedly instantly shows me how many have digged a page, how many have shared on Friendfeed. I can directly post to Twitter, and probably a lot of other things I have not discovered yet. And did I mention it is a great and innnovative news service. You've really have some aha's to discover if you haven't tried it yet, but be sure to check out Google Reader first, since it is tightly integrated with that service.

Because everything works across all OSes, and it is really fast to set up on a new computer (just bought a new home pc so its verified), I can move across machines and stay informed all the time. It was Adobe Air + Tweetdeck that was the most time consuming thing to install, because it is a fat client application. I find it pretty useful, therefore I bother installing it on multiple machines.

Wednesday, February 11, 2009

2 Icons, 2 movements

Last night a thought struck me: there is a striking similarity in how the open source- and snowboarding movements has evolved. Both has been led by strong iconic persons namely Terje Håkonsen and Richard Stallman. They are both controverse untouchables (as in the movie with Kevin Costner), they do and say exactly what they believe in. Their influence has been very important in establishing these movements that is not controlled by any commercial interests, although none of them would have made it this far with commercial support.

Håkonsen is reknown for boycotting the IOC and FIS and olympic qualification for halfpipe when it was introduced in the omlympics. Stallman is so reknowned for his controversery that a lot of people thinks he is just a childish troublemaker.

The reason these movements have become so powerful, even in a commercial sense, is that they consists of large crowds of people. They have to some varying degrees developed ethics and morale, and those not conforming are effectively kind of excluded or frozen out. The crowds is not driven by commercial interests, but it would be false to state that is not commercial interests involved. Snowboarding is not for economically faint hearted, and software development is seldom gratis.

An example showing the power of the Open Source movement: Try to imagine the WWW without the Apache Http Server. It is undeniable the very reason the acronym http is known probably by half or more of the world population. It has lost "market shares", but it is the Apache http Server that has made the widespread deployment of the WWW possible. It has alwasy been on the frontiers of the WWW, and I guess it still is. In addition most application servers has an Apache http Server in front of them to handle caching and serving static content and many other tasks. I think the success for Linux in the server room is largely because of Apache http Server. The Apache Foundation has long been sponsored by IBM.

The Open Source movement has laid a foundation for business, information spreading and social interaction over the internet that would not have taken place or would have looked completely different without it. In fact, I think Open Source has been genuinely good for the internet and computing in general, lowering the bar for adoption. It makes it possible for everyone the freedom of speach if they want to.

Open Source is again showing it's strength in times where financial crisis otherwise might have strangled innovation. Instead innovation on the internet is flourishing, largely powered by Open Source. I think now the movement has become strong enough to transform the software business. I do not dare predict how it will transform it but I think it will be good.

During the US election campaign in 2008 i think I overheard a statement made by John McCain , when he started to realize he was not winning the election: The man (Barack Obama) is a movement. You can't stop a movement.

BTW: This weekend I will do practice som freedom of speech here: http://twitter.com/OfficalWRC

Sunday, February 8, 2009

Let Open Source set you free and be unique

A bit of a bold heading, you would say? Well that may depend on your attitude against and knowledge about Open Source. My opinion is that when you master a set of open source products, they will let you develop more freely. This is probably not that different from proprietary products, but with Open Source you can get deep knowledge very fast. Chances are that there are many developers "out there" having expert knowledge about most widely used Open Source products.

This post focuses on these advantages of Open Source:

Competence can be inhouse and not just on a support line
Defines what is more or less standards of how software us built today
No up front costs and procurement process

Now I will try to explain what I mean by stating this.

Inhouse competence
By hiring or employing the adequate Open Source competence software companies can get a head start on a project. The good thing about this is that the competence is not on the other side a phone or just being an email address. When challenges are met or there is a problem in production, the competence is right there with you.

Software standards
There is a lack of defined standards on how software should be designed. By design I mean what this article by Jack W. Reeves says. I do not doubt the value of visual software design, but this is just a vehicle for making the correct code design.
A lot of Open Source products has made such a strong foothold in software development that they more or less dominate in some categories of applications, e.g. the Spring Framework . Spring framework is very flexible, but by following the recommended conventions you will probably run into less trouble (not that impedes more trouble than other frameworks). That said, Spring is not very intrusive, so it should be pretty simple to adjust to conventions that is often also common sense.
When you adhere to the common sense conventions it mostly makes the software produced easier to use with other components and frameworks. It is necessary though to have the necessary competence when decisions is made along the way, just as you would with proprietary products. The difference, again, is that you do not have to get it from the vendors support or experts.

No procurement process
There is no costs, or at least they are very low, so getting permission to use Open Source is not an issue to discuss with the financial department. It is necessary to evaluate different products, also proprietary when they exists, for the project. Until now commercial products has often won these evaluations, but mainstream development is taking a new direction. I am not surprised that is happening right now, as the financial crisis raises questions about costly software.

An important consequence pointed out in the Agile Executive post, is that the software lives before the eyes of the user because with Open Source you can move as freely as the available competence is able to produce new features. This is very attractive, as the users keep coming back for exploring new functionality.

Another important point here is that when new technology surfaces, there is easier to switch when there has not been investments in commercial products. Software with a good design is flexible and makes as few assumptions of it's surroundings as possible.

Be free and unique
Now to the point of this post! Open Source let you, with the right competence, focus on the things you are good at, and deliver faster than with proprietary products. The business value or end user experience is what counts. Users mostly does not care how software is designed, but they value good software.
When you can focus on this, chances are that the product will be better and the users more happy. Unhappy users can express them self freely and quickly on the internet, especially after the advent of Web 2.0. A product must often rely on the word of mouth. With Twitter and the like being adopted at a blazing speed, word of mouth is spreading very fast.

Examples that this is true is flourishing:

Facebook
Twitter
Android
Google Chrome, the browser no one expected and got Google a lot of positive attention

BTW this post is written in Firefox, utilizing the Delicious plugin all the time.

Tuesday, January 27, 2009

Web 2.0: Use , create and evaluate it

Yesterday Andrew McAfee wrote a excellent post on what characterizes Enterprise Web 2.0 software. Key points from the blog is that Enterprise Web 2.0 software must be:

Freeform
Frictionless
Emerging

Use it
All software team members should use Enterprise Web 2.0 tools to learn about the possibilities it opens up and to improve the processes. Especially Web 2.0 software is good at capturing the messy parts legacy processes and other tools has failed to encompass. Most Agile projects has already adopted wikis, for supporting the process, to scale beyond the ideal number of project members. Today many projects are geographically distributed or members telecommute. Here the social media part of Web 2.0 will be valuable.

Create it
During the whole process of developing software the characteristics of Web 2.0 software described by McAfee must be consulted.
These abilities must be ubiquitous in the software design and architecture both when developing brand new Web 2.0 software but also when extending legacy software with modern features.

These characteristics will probably evolve so stay tuned the Web 2.0-sphere.
Most developers will not create all features of a software product in the projects they are participating in. This will certainly be true for Web 2.0 software since a lot of Web 2.0 services are present already on the internet. They will only integrate with these software to add value to the service or vice versa. Software products may even thrive on existing social software with large user bases. By authenticating with e.g. OpenId or Facebook Connect you instantly get access to a large mass of users. Facebook users will probably soon be so spoiled that they may even try new services that does not support Facebook Connect. It also useful to look for open dataformats, such as those described at Microformats.

Evaluate it

Developers and others must evaluate existing software Web 2.0 products and services that they have to integrate with. Short time to release and economical constraints forces reuse of existing software/services. Especially reuse (social media) that has large user bases can be fruitful and be a critical success factor.
The characteristics McAfee describes should be used when evaluating Web 2.0 software.

Thursday, January 15, 2009

The Economy and Innovation

Nowadays a lot of things seems to happen that will influence on enterprises and individuals. The economy takes unpredictable turns, and innovation may suffer from capital starvation because more restrictive and careful financial investment policies.

Innovation inevitably comes with risk, and a lot of investors will be much more careful in respect to if and where they place their money. Careful investors may perform a much more thorough investigation of potential investments on beforehand than before.

My guess is that projects that has a solid innovation model, based on Enterprise Web 2.0, will have an advantage. This innovation model should include collaboration from all stakeholders, including its users. What this means to each and every project will vary, and Enterprise Web 2.0 must be adapted to the target organization(s) and individuals and vice versa.

Open Innovation can be seen as Enterprise Web 2.0 in an innovation context. Open Innovation recognizes that it is not affordable or rational that enterprises invents products solely from it's own research. Knowledge is distributed and must be gathered into collaboration.

I think Open Innovation can be taken one step further fully utilizing Web 2.0 software, with Social Media and Wikinomics to enable Mass Collaboration at a large scale. A popular term for this is Crowdsourcing.

Enterprise Web 2.0 connects knowledgeable people in new ways through social media, and collaboration through wikis is far more effective than email. Wikis is there for every interested person to read and contribute, and is not limited to mailing list. Further everyone is informed at the same time, at least all those subscribing to feeds from the wiki.

The community that arises around an idea or project will be a part of the backing capital, that nurtures it with capital, energy and direction. When there are several strong (economical or knowledgable) stakeholders involved, risk is spread.

So how can a software development enterprise start thriving from Open Innovation? Guess there will be no easy answer for that. In general I think it will be good to start in the small, opening up extension points that business partners can be interested in collaborating and provide value added services. Discussions and documentation must be located at a wiki, and people should be able to get acquainted with each other through some social media.

Opening up extension points in a software product, clearly puts some requirements on its architecture. To begin with, explore microformats and authentication solutions like OpenId, then consider if other formats must be invented.

I have established a Google Site, a subsite of my employer Webstep's site, for this subject. Please read more at http://sites.google.com/a/webstep.no/openinnovation/