Friday, November 27, 2009

Excel and grep is the logfile analyzers best friend

Lately I have been analyzing some logfiles for invocation times of remote services. Luckily the most problematic part of the system logs time consumption together with a remote system identifier and some other stuff.

Some years ago I was working on SmartLearn, implementing analytics for learning accountables. Through that work I got to know Excels Pivoting capabilities, but back then I used Microsoft Analysis Services for creating the Pivot tables. I have also been a user of Linux-, Unix- and Cygwin's sharp commandline tools for a long time. Seperately I know strength of the tools from both of these worlds, but I did not recognize how I could use them together.

One of my project peers showed me how Excel could be used to extract data from flat files and present it as Pivot tables in Excel with very few steps. The key premise is that the interesting dimensions of the Pivot-table is logged on the same lines as the interesting numbers/text. The technnique described here will let you visualize counting of things. Even without any numbers the frequence of things can be very interesting. Most system log files contains a timestamp, and this can be combined with almost anything *. Using *nix-commandline tools it is of course possible to extract whatever information you like from flat files.


17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401

To extract only these lines I use Cygwin's grep command like this:
grep Task.*duration <logfilename> > extractedlog.txt


From Excel open the extracted log file. Excel will recognize the file as a textfile that you might want to split in columns. Choose between splitting at fixed positions and characters. Spaces can be a viable option in some cases.













When you have imported the file it may be necessary to split columns manually using the Text To Columns tool in Data toolbar.

Now you must insert a row at the top of the dataset, and add header names of the interesting columns. Cut & paste the interesting columns so they are adjacent to each other.

Now select the Insert tool in Excel, and Pivot Table (leftmost button in my installtion of Excel 2007 under Insert)

Choose either Table or Chart, and select the interesting fact and dimension columns from the spreadsheat. When you click OK, you can start dragging & dropping columns into Axes, Values and Legend containers on the right side.

You can now twist the logdata as you want, and find relations between them you would spend enormous amount of time finding manually in the log file. Some simple examples:


To create the dataset I used this Java code that produces differentiated exection trends in a number of threads (which is not meant as a educational example of how to write multithreaded code):

package com.webstep.logfilegenerator;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class LogGenerator {

/**
* @param args
*/
public static void main(String[] args) {
ExecutorService exSvc = Executors.newFixedThreadPool(3);

List<Future> futures = new ArrayList<Future>();
final Random r = new Random();
for (int i = 0; i < 10; i++) {
futures.add(exSvc.submit(new Task1("Task"+i, r.nextInt(i+1*500))));
}
try {
for (Future f : futures) {
System.out.println(f.get(10000, TimeUnit.MILLISECONDS));
}

} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TimeoutException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
exSvc.shutdownNow();
}
}
}



import java.util.Random;
import java.util.concurrent.Callable;

import org.apache.log4j.Logger;


public class Task1 implements Callable {
private static Logger log = Logger.getLogger(Task1.class.getName());
private final static Random r = new Random();
private final int maxTimeout;
private final String name;

public Task1(final String name, final int maxTimeout) {
super();
this.maxTimeout = maxTimeout;
this.name = name;
log.debug(name + " maxtimeout=" + maxTimeout);
}


public Object call() throws Exception {
while (true) {
log.debug("start call " + name);

long start = System.currentTimeMillis();
Thread.sleep(r.nextInt(maxTimeout));
log.debug("end call, " + name + " duration=" + (System.currentTimeMillis() - start));
}

}
}


Some sample output from this code:
17:47:56,798 DEBUG Task1:19 - Task0 maxtimeout=98
17:47:56,803 DEBUG Task1:19 - Task1 maxtimeout=12
17:47:56,803 DEBUG Task1:25 - start call Task0
17:47:56,804 DEBUG Task1:19 - Task2 maxtimeout=17
17:47:56,804 DEBUG Task1:25 - start call Task1
17:47:56,806 DEBUG Task1:19 - Task3 maxtimeout=51
17:47:56,807 DEBUG Task1:19 - Task4 maxtimeout=92
17:47:56,807 DEBUG Task1:19 - Task5 maxtimeout=47
17:47:56,808 DEBUG Task1:19 - Task6 maxtimeout=8
17:47:56,809 DEBUG Task1:19 - Task7 maxtimeout=87
17:47:56,809 DEBUG Task1:19 - Task8 maxtimeout=66
17:47:56,810 DEBUG Task1:19 - Task9 maxtimeout=39
17:47:56,808 DEBUG Task1:25 - start call Task2
17:47:56,876 DEBUG Task1:29 - end call, Task2 duration=65
17:47:56,878 DEBUG Task1:25 - start call Task2
17:47:57,030 DEBUG Task1:29 - end call, Task2 duration=151
17:47:57,031 DEBUG Task1:25 - start call Task2
17:47:57,174 DEBUG Task1:29 - end call, Task0 duration=369
17:47:57,175 DEBUG Task1:25 - start call Task0
17:47:57,296 DEBUG Task1:29 - end call, Task1 duration=490
17:47:57,297 DEBUG Task1:25 - start call Task1
17:47:57,323 DEBUG Task1:29 - end call, Task0 duration=146
17:47:57,324 DEBUG Task1:25 - start call Task0
17:47:57,390 DEBUG Task1:29 - end call, Task1 duration=91
17:47:57,391 DEBUG Task1:25 - start call Task1
17:47:57,448 DEBUG Task1:29 - end call, Task2 duration=415
17:47:57,449 DEBUG Task1:25 - start call Task2
17:47:57,549 DEBUG Task1:29 - end call, Task1 duration=157
17:47:57,550 DEBUG Task1:25 - start call Task1
17:47:57,729 DEBUG Task1:29 - end call, Task2 duration=278
17:47:57,730 DEBUG Task1:25 - start call Task2
17:47:57,802 DEBUG Task1:29 - end call, Task0 duration=477
17:47:57,803 DEBUG Task1:25 - start call Task0
17:47:57,886 DEBUG Task1:29 - end call, Task1 duration=334
17:47:57,887 DEBUG Task1:25 - start call Task1
17:47:58,048 DEBUG Task1:29 - end call, Task2 duration=316
17:47:58,048 DEBUG Task1:25 - start call Task2
17:47:58,068 DEBUG Task1:29 - end call, Task0 duration=264
17:47:58,069 DEBUG Task1:25 - start call Task0
17:47:58,087 DEBUG Task1:29 - end call, Task1 duration=199
17:47:58,092 DEBUG Task1:25 - start call Task1
17:47:58,098 DEBUG Task1:29 - end call, Task1 duration=5
17:47:58,098 DEBUG Task1:25 - start call Task1
17:47:58,099 DEBUG Task1:29 - end call, Task2 duration=48
17:47:58,099 DEBUG Task1:25 - start call Task2
17:47:58,115 DEBUG Task1:29 - end call, Task1 duration=17
17:47:58,115 DEBUG Task1:25 - start call Task1
17:47:58,245 DEBUG Task1:29 - end call, Task2 duration=146
17:47:58,245 DEBUG Task1:25 - start call Task2
17:47:58,339 DEBUG Task1:29 - end call, Task0 duration=267
17:47:58,340 DEBUG Task1:25 - start call Task0
17:47:58,398 DEBUG Task1:29 - end call, Task0 duration=57
17:47:58,399 DEBUG Task1:25 - start call Task0
17:47:58,412 DEBUG Task1:29 - end call, Task0 duration=12
17:47:58,412 DEBUG Task1:25 - start call Task0
17:47:58,542 DEBUG Task1:29 - end call, Task1 duration=427
17:47:58,543 DEBUG Task1:25 - start call Task1
17:47:58,573 DEBUG Task1:29 - end call, Task2 duration=326
17:47:58,574 DEBUG Task1:25 - start call Task2
17:47:58,741 DEBUG Task1:29 - end call, Task1 duration=197
17:47:58,742 DEBUG Task1:25 - start call Task1
17:47:58,783 DEBUG Task1:29 - end call, Task0 duration=369
17:47:58,784 DEBUG Task1:25 - start call Task0
17:47:58,835 DEBUG Task1:29 - end call, Task2 duration=260
17:47:58,835 DEBUG Task1:25 - start call Task2
17:47:58,873 DEBUG Task1:29 - end call, Task0 duration=89
17:47:58,873 DEBUG Task1:25 - start call Task0
17:47:58,923 DEBUG Task1:29 - end call, Task2 duration=87
17:47:58,923 DEBUG Task1:25 - start call Task2
17:47:58,942 DEBUG Task1:29 - end call, Task0 duration=67
17:47:58,942 DEBUG Task1:25 - start call Task0
17:47:59,084 DEBUG Task1:29 - end call, Task1 duration=342
17:47:59,084 DEBUG Task1:25 - start call Task1
17:47:59,091 DEBUG Task1:29 - end call, Task2 duration=166
17:47:59,091 DEBUG Task1:25 - start call Task2
17:47:59,115 DEBUG Task1:29 - end call, Task0 duration=172
17:47:59,115 DEBUG Task1:25 - start call Task0
17:47:59,348 DEBUG Task1:29 - end call, Task2 duration=257
17:47:59,348 DEBUG Task1:25 - start call Task2
17:47:59,473 DEBUG Task1:29 - end call, Task2 duration=124
17:47:59,474 DEBUG Task1:25 - start call Task2
17:47:59,505 DEBUG Task1:29 - end call, Task0 duration=389
17:47:59,505 DEBUG Task1:25 - start call Task0
17:47:59,506 DEBUG Task1:29 - end call, Task2 duration=31
17:47:59,507 DEBUG Task1:25 - start call Task2
17:47:59,534 DEBUG Task1:29 - end call, Task1 duration=450
17:47:59,535 DEBUG Task1:25 - start call Task1
17:47:59,579 DEBUG Task1:29 - end call, Task1 duration=44
17:47:59,580 DEBUG Task1:25 - start call Task1
17:47:59,629 DEBUG Task1:29 - end call, Task2 duration=121
17:47:59,630 DEBUG Task1:25 - start call Task2
17:47:59,667 DEBUG Task1:29 - end call, Task0 duration=160
17:47:59,668 DEBUG Task1:25 - start call Task0
17:47:59,747 DEBUG Task1:29 - end call, Task1 duration=167
17:47:59,748 DEBUG Task1:25 - start call Task1
17:47:59,852 DEBUG Task1:29 - end call, Task2 duration=221
17:47:59,852 DEBUG Task1:25 - start call Task2
17:48:00,153 DEBUG Task1:29 - end call, Task0 duration=485
17:48:00,154 DEBUG Task1:25 - start call Task0
17:48:00,156 DEBUG Task1:29 - end call, Task1 duration=408
17:48:00,156 DEBUG Task1:25 - start call Task1
17:48:00,185 DEBUG Task1:29 - end call, Task2 duration=332
17:48:00,185 DEBUG Task1:25 - start call Task2
17:48:00,316 DEBUG Task1:29 - end call, Task2 duration=130
17:48:00,317 DEBUG Task1:25 - start call Task2
17:48:00,435 DEBUG Task1:29 - end call, Task2 duration=117
17:48:00,436 DEBUG Task1:25 - start call Task2
17:48:00,542 DEBUG Task1:29 - end call, Task1 duration=385
17:48:00,543 DEBUG Task1:25 - start call Task1
17:48:00,629 DEBUG Task1:29 - end call, Task0 duration=474
17:48:00,631 DEBUG Task1:25 - start call Task0
17:48:00,720 DEBUG Task1:29 - end call, Task2 duration=283
17:48:00,721 DEBUG Task1:25 - start call Task2
17:48:00,809 DEBUG Task1:29 - end call, Task1 duration=266
17:48:00,810 DEBUG Task1:25 - start call Task1
17:48:00,886 DEBUG Task1:29 - end call, Task1 duration=74
17:48:00,887 DEBUG Task1:25 - start call Task1
17:48:00,908 DEBUG Task1:29 - end call, Task0 duration=276
17:48:00,909 DEBUG Task1:25 - start call Task0
17:48:01,023 DEBUG Task1:29 - end call, Task2 duration=302
17:48:01,023 DEBUG Task1:25 - start call Task2
17:48:01,140 DEBUG Task1:29 - end call, Task0 duration=230
17:48:01,140 DEBUG Task1:25 - start call Task0
17:48:01,349 DEBUG Task1:29 - end call, Task1 duration=462
17:48:01,350 DEBUG Task1:25 - start call Task1
17:48:01,378 DEBUG Task1:29 - end call, Task2 duration=355
17:48:01,378 DEBUG Task1:25 - start call Task2
17:48:01,518 DEBUG Task1:29 - end call, Task2 duration=139
17:48:01,518 DEBUG Task1:25 - start call Task2
17:48:01,599 DEBUG Task1:29 - end call, Task1 duration=249
17:48:01,599 DEBUG Task1:25 - start call Task1
17:48:01,603 DEBUG Task1:29 - end call, Task0 duration=461
17:48:01,603 DEBUG Task1:25 - start call Task0
17:48:01,939 DEBUG Task1:29 - end call, Task0 duration=335
17:48:01,940 DEBUG Task1:25 - start call Task0
17:48:01,997 DEBUG Task1:29 - end call, Task2 duration=476
17:48:01,998 DEBUG Task1:25 - start call Task2
17:48:02,014 DEBUG Task1:29 - end call, Task1 duration=414
17:48:02,015 DEBUG Task1:25 - start call Task1
17:48:02,146 DEBUG Task1:29 - end call, Task1 duration=131
17:48:02,147 DEBUG Task1:25 - start call Task1
17:48:02,261 DEBUG Task1:29 - end call, Task0 duration=320
17:48:02,262 DEBUG Task1:25 - start call Task0
17:48:02,343 DEBUG Task1:29 - end call, Task2 duration=342
17:48:02,343 DEBUG Task1:25 - start call Task2
17:48:02,378 DEBUG Task1:29 - end call, Task1 duration=230
17:48:02,378 DEBUG Task1:25 - start call Task1
17:48:02,520 DEBUG Task1:29 - end call, Task0 duration=257
17:48:02,520 DEBUG Task1:25 - start call Task0
17:48:02,597 DEBUG Task1:29 - end call, Task2 duration=253
17:48:02,598 DEBUG Task1:25 - start call Task2
17:48:02,734 DEBUG Task1:29 - end call, Task1 duration=355
17:48:02,735 DEBUG Task1:25 - start call Task1
17:48:02,786 DEBUG Task1:29 - end call, Task0 duration=265
17:48:02,787 DEBUG Task1:25 - start call Task0
17:48:02,959 DEBUG Task1:29 - end call, Task2 duration=361
17:48:02,959 DEBUG Task1:25 - start call Task2
17:48:03,017 DEBUG Task1:29 - end call, Task0 duration=228
17:48:03,018 DEBUG Task1:25 - start call Task0
17:48:03,027 DEBUG Task1:29 - end call, Task2 duration=66
17:48:03,030 DEBUG Task1:25 - start call Task2
17:48:03,214 DEBUG Task1:29 - end call, Task1 duration=478
17:48:03,215 DEBUG Task1:25 - start call Task1
17:48:03,299 DEBUG Task1:29 - end call, Task2 duration=268
17:48:03,300 DEBUG Task1:25 - start call Task2
17:48:03,305 DEBUG Task1:29 - end call, Task1 duration=90
17:48:03,306 DEBUG Task1:25 - start call Task1
17:48:03,324 DEBUG Task1:29 - end call, Task1 duration=17
17:48:03,325 DEBUG Task1:25 - start call Task1
17:48:03,345 DEBUG Task1:29 - end call, Task2 duration=44
17:48:03,345 DEBUG Task1:25 - start call Task2
17:48:03,456 DEBUG Task1:29 - end call, Task1 duration=130
17:48:03,456 DEBUG Task1:25 - start call Task1
17:48:03,484 DEBUG Task1:29 - end call, Task2 duration=138
17:48:03,484 DEBUG Task1:25 - start call Task2
17:48:03,491 DEBUG Task1:29 - end call, Task2 duration=6
17:48:03,492 DEBUG Task1:25 - start call Task2
17:48:03,496 DEBUG Task1:29 - end call, Task0 duration=477
17:48:03,496 DEBUG Task1:25 - start call Task0
17:48:03,696 DEBUG Task1:29 - end call, Task0 duration=199
17:48:03,696 DEBUG Task1:25 - start call Task0
17:48:03,796 DEBUG Task1:29 - end call, Task1 duration=340
17:48:03,809 DEBUG Task1:25 - start call Task1
17:48:03,841 DEBUG Task1:29 - end call, Task2 duration=348
17:48:03,841 DEBUG Task1:25 - start call Task2
17:48:03,845 DEBUG Task1:29 - end call, Task1 duration=35
17:48:03,847 DEBUG Task1:25 - start call Task1
17:48:03,876 DEBUG Task1:29 - end call, Task0 duration=179
17:48:03,876 DEBUG Task1:25 - start call Task0
17:48:03,952 DEBUG Task1:29 - end call, Task2 duration=111
17:48:03,952 DEBUG Task1:25 - start call Task2
17:48:03,999 DEBUG Task1:29 - end call, Task0 duration=123
17:48:04,000 DEBUG Task1:25 - start call Task0
17:48:04,220 DEBUG Task1:29 - end call, Task1 duration=372
17:48:04,221 DEBUG Task1:25 - start call Task1
17:48:04,348 DEBUG Task1:29 - end call, Task2 duration=395
17:48:04,349 DEBUG Task1:25 - start call Task2
17:48:04,385 DEBUG Task1:29 - end call, Task0 duration=384
17:48:04,392 DEBUG Task1:25 - start call Task0
17:48:04,633 DEBUG Task1:29 - end call, Task2 duration=284
17:48:04,634 DEBUG Task1:25 - start call Task2
17:48:04,678 DEBUG Task1:29 - end call, Task1 duration=457
17:48:04,679 DEBUG Task1:25 - start call Task1
17:48:04,704 DEBUG Task1:29 - end call, Task1 duration=25
17:48:04,704 DEBUG Task1:25 - start call Task1
17:48:04,766 DEBUG Task1:29 - end call, Task0 duration=373
17:48:04,766 DEBUG Task1:25 - start call Task0
17:48:04,912 DEBUG Task1:29 - end call, Task2 duration=277
17:48:04,912 DEBUG Task1:25 - start call Task2
17:48:04,928 DEBUG Task1:29 - end call, Task2 duration=15
17:48:04,928 DEBUG Task1:25 - start call Task2
17:48:04,943 DEBUG Task1:29 - end call, Task1 duration=238
17:48:04,944 DEBUG Task1:25 - start call Task1
17:48:05,072 DEBUG Task1:29 - end call, Task2 duration=143
17:48:05,072 DEBUG Task1:25 - start call Task2
17:48:05,132 DEBUG Task1:29 - end call, Task2 duration=59
17:48:05,132 DEBUG Task1:25 - start call Task2
17:48:05,135 DEBUG Task1:29 - end call, Task1 duration=191
17:48:05,135 DEBUG Task1:25 - start call Task1
17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401
17:48:05,168 DEBUG Task1:25 - start call Task0
17:48:05,239 DEBUG Task1:29 - end call, Task1 duration=103
17:48:05,240 DEBUG Task1:25 - start call Task1
17:48:05,357 DEBUG Task1:29 - end call, Task2 duration=224
17:48:05,357 DEBUG Task1:25 - start call Task2
17:48:05,526 DEBUG Task1:29 - end call, Task1 duration=286
17:48:05,526 DEBUG Task1:25 - start call Task1
17:48:05,555 DEBUG Task1:29 - end call, Task2 duration=197
17:48:05,555 DEBUG Task1:25 - start call Task2
17:48:05,600 DEBUG Task1:29 - end call, Task2 duration=44
17:48:05,600 DEBUG Task1:25 - start call Task2
17:48:05,616 DEBUG Task1:29 - end call, Task0 duration=447
17:48:05,616 DEBUG Task1:25 - start call Task0
17:48:05,650 DEBUG Task1:29 - end call, Task2 duration=49
17:48:05,652 DEBUG Task1:25 - start call Task2
17:48:05,772 DEBUG Task1:29 - end call, Task2 duration=120
17:48:05,773 DEBUG Task1:25 - start call Task2
17:48:05,775 DEBUG Task1:29 - end call, Task1 duration=248
17:48:05,775 DEBUG Task1:25 - start call Task1
17:48:05,782 DEBUG Task1:29 - end call, Task2 duration=9
17:48:05,782 DEBUG Task1:25 - start call Task2
17:48:05,865 DEBUG Task1:29 - end call, Task2 duration=83
17:48:05,866 DEBUG Task1:25 - start call Task2
17:48:05,870 DEBUG Task1:29 - end call, Task2 duration=4
17:48:05,870 DEBUG Task1:25 - start call Task2
17:48:05,878 DEBUG Task1:29 - end call, Task0 duration=261
17:48:05,879 DEBUG Task1:25 - start call Task0
17:48:06,105 DEBUG Task1:29 - end call, Task2 duration=234
17:48:06,105 DEBUG Task1:25 - start call Task2
17:48:06,239 DEBUG Task1:29 - end call, Task0 duration=360
17:48:06,239 DEBUG Task1:25 - start call Task0
17:48:06,258 DEBUG Task1:29 - end call, Task2 duration=152
17:48:06,258 DEBUG Task1:25 - start call Task2
17:48:06,260 DEBUG Task1:29 - end call, Task1 duration=485
17:48:06,260 DEBUG Task1:25 - start call Task1
17:48:06,452 DEBUG Task1:29 - end call, Task0 duration=213
17:48:06,452 DEBUG Task1:25 - start call Task0
17:48:06,586 DEBUG Task1:29 - end call, Task1 duration=326
17:48:06,586 DEBUG Task1:25 - start call Task1
17:48:06,619 DEBUG Task1:29 - end call, Task0 duration=166
17:48:06,620 DEBUG Task1:25 - start call Task0
17:48:06,672 DEBUG Task1:29 - end call, Task0 duration=52
17:48:06,672 DEBUG Task1:25 - start call Task0
17:48:06,747 DEBUG Task1:29 - end call, Task2 duration=489
17:48:06,747 DEBUG Task1:25 - start call Task2
17:48:06,792 DEBUG Task1:29 - end call, Task1 duration=205
17:48:06,792 DEBUG Task1:25 - start call Task1



* Updated 2009-11-29

Saturday, October 31, 2009

Open Source will never get out of stock

It is like running out of light or wind. Software can be copied as many times as needed. Why haven't Microsoft and Apple understood this yet?

Some random examples of Windows 7 beeing out of stock
http://crave.cnet.co.uk/software/0,39029471,49303067,00.htm
http://www.digi.no/827047/windows-7-utsolgt-i-sverige

How can software be sold out? The only commodity invlolved is the software to be copied. To produce a physical copy a USB drive or recordable CD or DVD must be used in the process. But basically software can be copied between computers over the network. The required production equipment is computers, electric power and a network. These resources are abundant in the enterprise and in the average european/american home.

It seems like it is mostly pirates and the Open Source movement that have understood this distribution model. Commercial vendors seems to try to ignore or deny it. That is too bad, because they could profit from the extremely low distribution costs.
  • Pirates exploits the low distribution costs to distribute goods illegally
  • Open Source have relied on this distribution model for a long time. This has resulted in a very broad open source software usage, and distribution costs is approximately zero.
While Open Source and pirated digitial goods has very different usage profiles, they share the common properties of extremely effective and almost non-existing distribution costs. Open Source is mainly used by software developers, and consumers download software and digital media content for free.

In the last couple of years a new distribution patterns has gained foothold, beyond the most simple forms: SaaS. SaaS leverages software that can be used as-is instantly. Since there is no significant download and installation, the costs/effort required from the end user is even lower than with Open Source. For example the latest release of Ubuntu aims to provide a OS as a service by leveraging Amazon EC2 compatible images.

Software vendors, and other digital content providers, should free themselves from physical medias. Soon physical medias will be of interest of the entusiast. The rest of us will prefer instant gratification over the physical media.

Last weekend I was in a record shop where they of course played some music. I was very surprised when I saw the PC with Spotify was used instead of a CD player. That says it all about the state of physical distribution of digital goods.

Wednesday, September 9, 2009

DDD and Ubiquitous language

A recurring subject on several sessions on the 1st day of Javazone 09 was Domain Driven Design and Ubiquitous Language. I do agree with those believes it is important to establish a language that can be used by domain experts and developers to ensure consistency.

Just before I had to leave I attended the DDD panel discussion where it was discussed whether this language should be used in the code as well. Well that is completely natural for English speaking developers. For developers in countries where English is not a native language, this raises some questions:
  • The code can not be maintained by people not familiar with the language the code is written in. This is very relevant in outsourcing.
  • It will pose problems when companies from different countries merges and must update the code and integrate to reflect the new situation.
  • Code that will or can be exposed as open source or in company partnerships will be of less value if it is written in non English
My personal opinion is that Norwegian (my native language) does not look well in code. It feels unnatural.

Is it responsible to write code in a non English language? I think this should be discussed with the stakeholders. On the other hand when there is a language impedance, how do you deal with it? Is it the developers responsibility?

I wanted to ask the panel these questions, but had to leave to catch a bus.

Updated: Corrected Domain Driven Development to Domain Driven design and added useful links.

Wednesday, August 26, 2009

Do you master social media?

This post is almost "off topic", and does not follow up on the series I announced in my previous post. Think I've forgot what I was thinking of then, but I'll surely get back to software and architecture later.

The background for this post is that I can not help myself getting frustrated over uninformed criticism over social media that a lot of old hat media seems to love these days.

Social media is, by many, viewed as a tornado of (useless/disconnected) information. On the other hand it is easy to find examples of constructive use of social media, that provides extremely useful information and knowledge to it's readers. For some examples look at my blogger following on the side of my blog. Sometimes I get irritated when people has strong negative opinions against social media (especially microblogging), stating it makes us dumber and unable to concentrate anymore. Well that is up to the individual to decide what to make of it, and decide to pick up a book instead of e.g. "hypertweeting".

I have stumbled upon extremely useful knowledge on Twitter, blogs and more professional web publishers, reading and discussing it through social media's unique features. My opinion is that social media is a whole lot better way of staying informed and increasing your knowledge than watching TV. Well at least a lot of what is distributed via TV is not exactly what tickles your brain to think.

To be really interesting in the social media space you will have to consume considerable amount of information, and possess the capability to transform and magnify this information to your readers. This requires deep concentration and is exactly how it always has been. Just think of ancient Greek philosophers. Some have large information processing capability and others not. Given that some has better capabilities than others, everyone can improve their skills. Social media provides the best training ground ever for improving the skill to express your knowledge. As Chris Anderson says in The Long Tail, the tools for expressing yourself in public writing has been democratized.

In my view expressing and discussing your knowledge is probably the most valuable thing a person can do for oneself and society in general. The ability to express your knowledge through writing is more important than ever. I am a strong believer of knowledge is key to improve individuals and mankind. Each and everyone should acquire the knowledge they need to make informed decisions for themselves or in a position as a decision maker for others such as leaders and politicians. With social media it is even possible to discuss things on a completely democratic way before things is realized.

Saturday, June 13, 2009

Twitter - an untested spacecraft

Oh no, not a another post analyzing the growth of Twitter I hear you say. But here I will only use it to exemplify how software architects can and should prepare for rapid growth of services that are or will become an integral part of the social Web.

This post is first in a series of posts I am currently thinking of, on Software Architecture for the Social Web. Througout the post I will use Service Categorization defined at Cantara Wiki.

It will be irresponsible to implement a software system with all necessary investments for future scaling to large amounts of traffic. It is equally irresponsible to not architect it to let it grow so that it can handle large loads.

A software system "exposed" to the Social Web can have millions of potential users. That will apply to all applications that everyone can subscribe to. With the possibilities of OpenId and Facebook identity provisioning a subscription is not many clicks away. Who, when and how many will try the service depends on it's attractiveness and word-of-mouth on the web.

Twitter is exceptional in several ways: It as had an enormous growth, and has a lot of growing pains. It is unusual for a H2A service that have been that unreliable to sustain such a remarkable growth. Launching Twitter must have been like launching a space craft that had not been tested for the stress it has been exposed to. Astronauts and ground crew has only 1 chance of success: there will be no repair or improvements after launch. Luckily software can be repaired after launch, but it cannot be assumed it will be easy and with little impact on it's users without planning for it.

I don't know exactly how Twitter's architecture have evolved from it's initial release, but there must be at least 3 key takeaways for architects (seen from the outside):
  • It has always been easy to integrate with Twitter through it's API (A2A services), leveraging valueadded H2A services and clients
  • It has improved to handle the current load without breaking the API and the clients.
  • Except from a specific feature that was removed (causing an uproar), no functionality has been broken as I know of
These points is important to all software architects, but on the social web failing to fulfill them will certainly be a disaster. I guess not all services will experience such faithful users as Twitter, that has kept coming back despite instability at times. Twitter is probably highly valued among a lot of users, is free to use and can not be held economical- or legally responsible for any loss of data or unavailabilty.

When architecting a service that have to fulfill a SLA, or is paid for, some level of stability will be assumed. Failing to do that might have economical- , legal- and market implications. This applies to all services, and especially when offering commercial services exposed to the social web, some key architecture aspects most be thorughly planned. To number a few important ones:
  • All services (that is all types described on Cantara Wiki, but especially H2A and A2A) has to be orthogonal. This ensures least painful replacements when they become bottlenecks or otherwise insufficent.
  • Clean interfaces and clearly defined responsibilites at all levels of the architecture.
  • Conscious use of tradeoffs to deliver early but still keeping the architecture agile to let it evolve.
  • Measuring of performance and resource usage must be built-in to identify bottlenecks early.
These things will be essential for all projects aiming at the social web as its user base. It will be hard to predict adoption rates, and it will seldom be economically feasable to make it extremely scaleable before launch. Most projects must release early to start the revenue stream and get feedback from users on what to implement next.

Saturday, May 16, 2009

What can Open do for you?

Yesterday Totto noticed me of this Gartner blog about how CIO's should answer when asked about what IT contributes to the enterprise. Mark McDonald suggest that the answer should be how IT contributes to the core business model. This way it can be shed some light on how business models and IT can both be improved to be better positioned against the competition.

Another blog post that caught my interest yesterday was Matt Asay's
Cloud computing: A natural conclusion of open source? Here Asay is writing about Tim O'Reilly's prediction that Cloud Computing will make Open Source licenses irrelevant. A bit far from what my post is about, but it stresses that new software will be consuming/provider of services. In order for new software based on the SOA/Cloud Computing paradigms, open data formats and open standards will be quite important. Open Source is a natural way of implementing both data formats and standards, and hopefully this is going to happen. Even inside enterprises these thoughts should be considered, so it will be possible to combine the result of different development efforts to create even more valuable software to support the core business model.

2 days ago, I fell in love with this blogpost from Rickard Öberg about quickness. Although this post is about quickness at the individual developer and team level, it applies to organizations as well. To quickly reposition against the competition and market demand, sharp services exposing open data formats and beeing based on open standards will be crucial.

The CIO should understand what Open can do for the core business model, as this will be utterly important as more software is SOA based, running in the cloud. The ability to move quickly has been important at all times, but in some ways the changes in paradigms is accelerating and it will be more visible when you does not posess this ability.

Wednesday, March 25, 2009

Yesterday, Now and Tomorrow

Designing software is a continuous task, requiring enduring effort and attention.
I strongly believe that the Code is the design. All efforts leading to the delivered code is just preparations for creating the design that gets released.

During the lifetime of a software product better and less good decisions are made along the way. Some decisions are made consciously and others more accidentally. After reading and discussing Technical Debt with my colleagues lately, I have come to think that is other economical terms that also fits on design decisions like investing. Investments can pay off, not paying off and not loosing, and loosing. What you do is make a bet. The greater the investment, the more certain you should be that is the right thing to do.

Ward Cunningham coined the Technical debt phrase. Martin Fowler and Steve McConnell has elaborated on it. This post tries to tie the investing term into software design. I really hope that by using these terms it is possible to create a software design meta language that is better understood by non technical decision makers.

Yesterday's decisions
Bad design from the past create technical debt. This debt should be downpayed (not just paying interests). It has to be a prioritized objective in every project to resolve this into better solutions.

Now
Sometimes there is a need to do something to the design just get the next release out the door. These decisions typically creates debt as a rule of thumb. Make these investments as small as possible.
Update: Got a useful comment from @javatotto, about that the effect of these kind of design decisions should be isolated, so as few dependencies to them are created from elsewhere. Great comment that could be topic for the next post ;-)

Tomorrow
Investing for tomorrow is speculating. This is up front design, and often considered bad. This is especially true for projects that need the be able to change direction fast. It also requires an effort not helping you get the next release out of the door. You might even invest in something you will not need, which will represent a complete loss.

Small investments seldom make big trouble and can not represent big losses.

Big investments for tomorrow needs to be watched closely as they can potentially ruin the project. Maybe the worst thing about big investments are that they are extremely hard to dispose when they prove to be a loss. There is a cognitive attachment (ownership) that hinders replacing them with better solutions. The processes around big investments should be open, so that all people having a interests in it can have influence on how it develops. As I am a eager proponent for Open Innovation, I think especially these kind of investments can profit from developed as open source or at least open as possible.

Not paying down on the existing debt will ultimately be the terminator for a project. Final. I have seen some correlation between big investments and not paying down the technical debt in most projects I have participated.
Big investments for the future has potential to steal energy and focus (the projects short term capital) from things that should be fixed. Often it is the most talented developers involved in both activities, and they seldom manage to to both simultaneously. The worst case is when doing the wrong investment. That makes this even worse, as the debt grows dangerously fast.
 
Powered by Disqus Creative Commons License
A Software Developers Perspective by Dag Blakstad is licensed under a Creative Commons Attribution 3.0 Norway License.