Friday, November 27, 2009

Excel and grep is the logfile analyzers best friend

Lately I have been analyzing some logfiles for invocation times of remote services. Luckily the most problematic part of the system logs time consumption together with a remote system identifier and some other stuff.

Some years ago I was working on SmartLearn, implementing analytics for learning accountables. Through that work I got to know Excels Pivoting capabilities, but back then I used Microsoft Analysis Services for creating the Pivot tables. I have also been a user of Linux-, Unix- and Cygwin's sharp commandline tools for a long time. Seperately I know strength of the tools from both of these worlds, but I did not recognize how I could use them together.

One of my project peers showed me how Excel could be used to extract data from flat files and present it as Pivot tables in Excel with very few steps. The key premise is that the interesting dimensions of the Pivot-table is logged on the same lines as the interesting numbers/text. The technnique described here will let you visualize counting of things. Even without any numbers the frequence of things can be very interesting. Most system log files contains a timestamp, and this can be combined with almost anything *. Using *nix-commandline tools it is of course possible to extract whatever information you like from flat files.


17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401

To extract only these lines I use Cygwin's grep command like this:
grep Task.*duration <logfilename> > extractedlog.txt


From Excel open the extracted log file. Excel will recognize the file as a textfile that you might want to split in columns. Choose between splitting at fixed positions and characters. Spaces can be a viable option in some cases.













When you have imported the file it may be necessary to split columns manually using the Text To Columns tool in Data toolbar.

Now you must insert a row at the top of the dataset, and add header names of the interesting columns. Cut & paste the interesting columns so they are adjacent to each other.

Now select the Insert tool in Excel, and Pivot Table (leftmost button in my installtion of Excel 2007 under Insert)

Choose either Table or Chart, and select the interesting fact and dimension columns from the spreadsheat. When you click OK, you can start dragging & dropping columns into Axes, Values and Legend containers on the right side.

You can now twist the logdata as you want, and find relations between them you would spend enormous amount of time finding manually in the log file. Some simple examples:


To create the dataset I used this Java code that produces differentiated exection trends in a number of threads (which is not meant as a educational example of how to write multithreaded code):

package com.webstep.logfilegenerator;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class LogGenerator {

/**
* @param args
*/
public static void main(String[] args) {
ExecutorService exSvc = Executors.newFixedThreadPool(3);

List<Future> futures = new ArrayList<Future>();
final Random r = new Random();
for (int i = 0; i < 10; i++) {
futures.add(exSvc.submit(new Task1("Task"+i, r.nextInt(i+1*500))));
}
try {
for (Future f : futures) {
System.out.println(f.get(10000, TimeUnit.MILLISECONDS));
}

} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TimeoutException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
exSvc.shutdownNow();
}
}
}



import java.util.Random;
import java.util.concurrent.Callable;

import org.apache.log4j.Logger;


public class Task1 implements Callable {
private static Logger log = Logger.getLogger(Task1.class.getName());
private final static Random r = new Random();
private final int maxTimeout;
private final String name;

public Task1(final String name, final int maxTimeout) {
super();
this.maxTimeout = maxTimeout;
this.name = name;
log.debug(name + " maxtimeout=" + maxTimeout);
}


public Object call() throws Exception {
while (true) {
log.debug("start call " + name);

long start = System.currentTimeMillis();
Thread.sleep(r.nextInt(maxTimeout));
log.debug("end call, " + name + " duration=" + (System.currentTimeMillis() - start));
}

}
}


Some sample output from this code:
17:47:56,798 DEBUG Task1:19 - Task0 maxtimeout=98
17:47:56,803 DEBUG Task1:19 - Task1 maxtimeout=12
17:47:56,803 DEBUG Task1:25 - start call Task0
17:47:56,804 DEBUG Task1:19 - Task2 maxtimeout=17
17:47:56,804 DEBUG Task1:25 - start call Task1
17:47:56,806 DEBUG Task1:19 - Task3 maxtimeout=51
17:47:56,807 DEBUG Task1:19 - Task4 maxtimeout=92
17:47:56,807 DEBUG Task1:19 - Task5 maxtimeout=47
17:47:56,808 DEBUG Task1:19 - Task6 maxtimeout=8
17:47:56,809 DEBUG Task1:19 - Task7 maxtimeout=87
17:47:56,809 DEBUG Task1:19 - Task8 maxtimeout=66
17:47:56,810 DEBUG Task1:19 - Task9 maxtimeout=39
17:47:56,808 DEBUG Task1:25 - start call Task2
17:47:56,876 DEBUG Task1:29 - end call, Task2 duration=65
17:47:56,878 DEBUG Task1:25 - start call Task2
17:47:57,030 DEBUG Task1:29 - end call, Task2 duration=151
17:47:57,031 DEBUG Task1:25 - start call Task2
17:47:57,174 DEBUG Task1:29 - end call, Task0 duration=369
17:47:57,175 DEBUG Task1:25 - start call Task0
17:47:57,296 DEBUG Task1:29 - end call, Task1 duration=490
17:47:57,297 DEBUG Task1:25 - start call Task1
17:47:57,323 DEBUG Task1:29 - end call, Task0 duration=146
17:47:57,324 DEBUG Task1:25 - start call Task0
17:47:57,390 DEBUG Task1:29 - end call, Task1 duration=91
17:47:57,391 DEBUG Task1:25 - start call Task1
17:47:57,448 DEBUG Task1:29 - end call, Task2 duration=415
17:47:57,449 DEBUG Task1:25 - start call Task2
17:47:57,549 DEBUG Task1:29 - end call, Task1 duration=157
17:47:57,550 DEBUG Task1:25 - start call Task1
17:47:57,729 DEBUG Task1:29 - end call, Task2 duration=278
17:47:57,730 DEBUG Task1:25 - start call Task2
17:47:57,802 DEBUG Task1:29 - end call, Task0 duration=477
17:47:57,803 DEBUG Task1:25 - start call Task0
17:47:57,886 DEBUG Task1:29 - end call, Task1 duration=334
17:47:57,887 DEBUG Task1:25 - start call Task1
17:47:58,048 DEBUG Task1:29 - end call, Task2 duration=316
17:47:58,048 DEBUG Task1:25 - start call Task2
17:47:58,068 DEBUG Task1:29 - end call, Task0 duration=264
17:47:58,069 DEBUG Task1:25 - start call Task0
17:47:58,087 DEBUG Task1:29 - end call, Task1 duration=199
17:47:58,092 DEBUG Task1:25 - start call Task1
17:47:58,098 DEBUG Task1:29 - end call, Task1 duration=5
17:47:58,098 DEBUG Task1:25 - start call Task1
17:47:58,099 DEBUG Task1:29 - end call, Task2 duration=48
17:47:58,099 DEBUG Task1:25 - start call Task2
17:47:58,115 DEBUG Task1:29 - end call, Task1 duration=17
17:47:58,115 DEBUG Task1:25 - start call Task1
17:47:58,245 DEBUG Task1:29 - end call, Task2 duration=146
17:47:58,245 DEBUG Task1:25 - start call Task2
17:47:58,339 DEBUG Task1:29 - end call, Task0 duration=267
17:47:58,340 DEBUG Task1:25 - start call Task0
17:47:58,398 DEBUG Task1:29 - end call, Task0 duration=57
17:47:58,399 DEBUG Task1:25 - start call Task0
17:47:58,412 DEBUG Task1:29 - end call, Task0 duration=12
17:47:58,412 DEBUG Task1:25 - start call Task0
17:47:58,542 DEBUG Task1:29 - end call, Task1 duration=427
17:47:58,543 DEBUG Task1:25 - start call Task1
17:47:58,573 DEBUG Task1:29 - end call, Task2 duration=326
17:47:58,574 DEBUG Task1:25 - start call Task2
17:47:58,741 DEBUG Task1:29 - end call, Task1 duration=197
17:47:58,742 DEBUG Task1:25 - start call Task1
17:47:58,783 DEBUG Task1:29 - end call, Task0 duration=369
17:47:58,784 DEBUG Task1:25 - start call Task0
17:47:58,835 DEBUG Task1:29 - end call, Task2 duration=260
17:47:58,835 DEBUG Task1:25 - start call Task2
17:47:58,873 DEBUG Task1:29 - end call, Task0 duration=89
17:47:58,873 DEBUG Task1:25 - start call Task0
17:47:58,923 DEBUG Task1:29 - end call, Task2 duration=87
17:47:58,923 DEBUG Task1:25 - start call Task2
17:47:58,942 DEBUG Task1:29 - end call, Task0 duration=67
17:47:58,942 DEBUG Task1:25 - start call Task0
17:47:59,084 DEBUG Task1:29 - end call, Task1 duration=342
17:47:59,084 DEBUG Task1:25 - start call Task1
17:47:59,091 DEBUG Task1:29 - end call, Task2 duration=166
17:47:59,091 DEBUG Task1:25 - start call Task2
17:47:59,115 DEBUG Task1:29 - end call, Task0 duration=172
17:47:59,115 DEBUG Task1:25 - start call Task0
17:47:59,348 DEBUG Task1:29 - end call, Task2 duration=257
17:47:59,348 DEBUG Task1:25 - start call Task2
17:47:59,473 DEBUG Task1:29 - end call, Task2 duration=124
17:47:59,474 DEBUG Task1:25 - start call Task2
17:47:59,505 DEBUG Task1:29 - end call, Task0 duration=389
17:47:59,505 DEBUG Task1:25 - start call Task0
17:47:59,506 DEBUG Task1:29 - end call, Task2 duration=31
17:47:59,507 DEBUG Task1:25 - start call Task2
17:47:59,534 DEBUG Task1:29 - end call, Task1 duration=450
17:47:59,535 DEBUG Task1:25 - start call Task1
17:47:59,579 DEBUG Task1:29 - end call, Task1 duration=44
17:47:59,580 DEBUG Task1:25 - start call Task1
17:47:59,629 DEBUG Task1:29 - end call, Task2 duration=121
17:47:59,630 DEBUG Task1:25 - start call Task2
17:47:59,667 DEBUG Task1:29 - end call, Task0 duration=160
17:47:59,668 DEBUG Task1:25 - start call Task0
17:47:59,747 DEBUG Task1:29 - end call, Task1 duration=167
17:47:59,748 DEBUG Task1:25 - start call Task1
17:47:59,852 DEBUG Task1:29 - end call, Task2 duration=221
17:47:59,852 DEBUG Task1:25 - start call Task2
17:48:00,153 DEBUG Task1:29 - end call, Task0 duration=485
17:48:00,154 DEBUG Task1:25 - start call Task0
17:48:00,156 DEBUG Task1:29 - end call, Task1 duration=408
17:48:00,156 DEBUG Task1:25 - start call Task1
17:48:00,185 DEBUG Task1:29 - end call, Task2 duration=332
17:48:00,185 DEBUG Task1:25 - start call Task2
17:48:00,316 DEBUG Task1:29 - end call, Task2 duration=130
17:48:00,317 DEBUG Task1:25 - start call Task2
17:48:00,435 DEBUG Task1:29 - end call, Task2 duration=117
17:48:00,436 DEBUG Task1:25 - start call Task2
17:48:00,542 DEBUG Task1:29 - end call, Task1 duration=385
17:48:00,543 DEBUG Task1:25 - start call Task1
17:48:00,629 DEBUG Task1:29 - end call, Task0 duration=474
17:48:00,631 DEBUG Task1:25 - start call Task0
17:48:00,720 DEBUG Task1:29 - end call, Task2 duration=283
17:48:00,721 DEBUG Task1:25 - start call Task2
17:48:00,809 DEBUG Task1:29 - end call, Task1 duration=266
17:48:00,810 DEBUG Task1:25 - start call Task1
17:48:00,886 DEBUG Task1:29 - end call, Task1 duration=74
17:48:00,887 DEBUG Task1:25 - start call Task1
17:48:00,908 DEBUG Task1:29 - end call, Task0 duration=276
17:48:00,909 DEBUG Task1:25 - start call Task0
17:48:01,023 DEBUG Task1:29 - end call, Task2 duration=302
17:48:01,023 DEBUG Task1:25 - start call Task2
17:48:01,140 DEBUG Task1:29 - end call, Task0 duration=230
17:48:01,140 DEBUG Task1:25 - start call Task0
17:48:01,349 DEBUG Task1:29 - end call, Task1 duration=462
17:48:01,350 DEBUG Task1:25 - start call Task1
17:48:01,378 DEBUG Task1:29 - end call, Task2 duration=355
17:48:01,378 DEBUG Task1:25 - start call Task2
17:48:01,518 DEBUG Task1:29 - end call, Task2 duration=139
17:48:01,518 DEBUG Task1:25 - start call Task2
17:48:01,599 DEBUG Task1:29 - end call, Task1 duration=249
17:48:01,599 DEBUG Task1:25 - start call Task1
17:48:01,603 DEBUG Task1:29 - end call, Task0 duration=461
17:48:01,603 DEBUG Task1:25 - start call Task0
17:48:01,939 DEBUG Task1:29 - end call, Task0 duration=335
17:48:01,940 DEBUG Task1:25 - start call Task0
17:48:01,997 DEBUG Task1:29 - end call, Task2 duration=476
17:48:01,998 DEBUG Task1:25 - start call Task2
17:48:02,014 DEBUG Task1:29 - end call, Task1 duration=414
17:48:02,015 DEBUG Task1:25 - start call Task1
17:48:02,146 DEBUG Task1:29 - end call, Task1 duration=131
17:48:02,147 DEBUG Task1:25 - start call Task1
17:48:02,261 DEBUG Task1:29 - end call, Task0 duration=320
17:48:02,262 DEBUG Task1:25 - start call Task0
17:48:02,343 DEBUG Task1:29 - end call, Task2 duration=342
17:48:02,343 DEBUG Task1:25 - start call Task2
17:48:02,378 DEBUG Task1:29 - end call, Task1 duration=230
17:48:02,378 DEBUG Task1:25 - start call Task1
17:48:02,520 DEBUG Task1:29 - end call, Task0 duration=257
17:48:02,520 DEBUG Task1:25 - start call Task0
17:48:02,597 DEBUG Task1:29 - end call, Task2 duration=253
17:48:02,598 DEBUG Task1:25 - start call Task2
17:48:02,734 DEBUG Task1:29 - end call, Task1 duration=355
17:48:02,735 DEBUG Task1:25 - start call Task1
17:48:02,786 DEBUG Task1:29 - end call, Task0 duration=265
17:48:02,787 DEBUG Task1:25 - start call Task0
17:48:02,959 DEBUG Task1:29 - end call, Task2 duration=361
17:48:02,959 DEBUG Task1:25 - start call Task2
17:48:03,017 DEBUG Task1:29 - end call, Task0 duration=228
17:48:03,018 DEBUG Task1:25 - start call Task0
17:48:03,027 DEBUG Task1:29 - end call, Task2 duration=66
17:48:03,030 DEBUG Task1:25 - start call Task2
17:48:03,214 DEBUG Task1:29 - end call, Task1 duration=478
17:48:03,215 DEBUG Task1:25 - start call Task1
17:48:03,299 DEBUG Task1:29 - end call, Task2 duration=268
17:48:03,300 DEBUG Task1:25 - start call Task2
17:48:03,305 DEBUG Task1:29 - end call, Task1 duration=90
17:48:03,306 DEBUG Task1:25 - start call Task1
17:48:03,324 DEBUG Task1:29 - end call, Task1 duration=17
17:48:03,325 DEBUG Task1:25 - start call Task1
17:48:03,345 DEBUG Task1:29 - end call, Task2 duration=44
17:48:03,345 DEBUG Task1:25 - start call Task2
17:48:03,456 DEBUG Task1:29 - end call, Task1 duration=130
17:48:03,456 DEBUG Task1:25 - start call Task1
17:48:03,484 DEBUG Task1:29 - end call, Task2 duration=138
17:48:03,484 DEBUG Task1:25 - start call Task2
17:48:03,491 DEBUG Task1:29 - end call, Task2 duration=6
17:48:03,492 DEBUG Task1:25 - start call Task2
17:48:03,496 DEBUG Task1:29 - end call, Task0 duration=477
17:48:03,496 DEBUG Task1:25 - start call Task0
17:48:03,696 DEBUG Task1:29 - end call, Task0 duration=199
17:48:03,696 DEBUG Task1:25 - start call Task0
17:48:03,796 DEBUG Task1:29 - end call, Task1 duration=340
17:48:03,809 DEBUG Task1:25 - start call Task1
17:48:03,841 DEBUG Task1:29 - end call, Task2 duration=348
17:48:03,841 DEBUG Task1:25 - start call Task2
17:48:03,845 DEBUG Task1:29 - end call, Task1 duration=35
17:48:03,847 DEBUG Task1:25 - start call Task1
17:48:03,876 DEBUG Task1:29 - end call, Task0 duration=179
17:48:03,876 DEBUG Task1:25 - start call Task0
17:48:03,952 DEBUG Task1:29 - end call, Task2 duration=111
17:48:03,952 DEBUG Task1:25 - start call Task2
17:48:03,999 DEBUG Task1:29 - end call, Task0 duration=123
17:48:04,000 DEBUG Task1:25 - start call Task0
17:48:04,220 DEBUG Task1:29 - end call, Task1 duration=372
17:48:04,221 DEBUG Task1:25 - start call Task1
17:48:04,348 DEBUG Task1:29 - end call, Task2 duration=395
17:48:04,349 DEBUG Task1:25 - start call Task2
17:48:04,385 DEBUG Task1:29 - end call, Task0 duration=384
17:48:04,392 DEBUG Task1:25 - start call Task0
17:48:04,633 DEBUG Task1:29 - end call, Task2 duration=284
17:48:04,634 DEBUG Task1:25 - start call Task2
17:48:04,678 DEBUG Task1:29 - end call, Task1 duration=457
17:48:04,679 DEBUG Task1:25 - start call Task1
17:48:04,704 DEBUG Task1:29 - end call, Task1 duration=25
17:48:04,704 DEBUG Task1:25 - start call Task1
17:48:04,766 DEBUG Task1:29 - end call, Task0 duration=373
17:48:04,766 DEBUG Task1:25 - start call Task0
17:48:04,912 DEBUG Task1:29 - end call, Task2 duration=277
17:48:04,912 DEBUG Task1:25 - start call Task2
17:48:04,928 DEBUG Task1:29 - end call, Task2 duration=15
17:48:04,928 DEBUG Task1:25 - start call Task2
17:48:04,943 DEBUG Task1:29 - end call, Task1 duration=238
17:48:04,944 DEBUG Task1:25 - start call Task1
17:48:05,072 DEBUG Task1:29 - end call, Task2 duration=143
17:48:05,072 DEBUG Task1:25 - start call Task2
17:48:05,132 DEBUG Task1:29 - end call, Task2 duration=59
17:48:05,132 DEBUG Task1:25 - start call Task2
17:48:05,135 DEBUG Task1:29 - end call, Task1 duration=191
17:48:05,135 DEBUG Task1:25 - start call Task1
17:48:05,168 DEBUG Task1:29 - end call, Task0 duration=401
17:48:05,168 DEBUG Task1:25 - start call Task0
17:48:05,239 DEBUG Task1:29 - end call, Task1 duration=103
17:48:05,240 DEBUG Task1:25 - start call Task1
17:48:05,357 DEBUG Task1:29 - end call, Task2 duration=224
17:48:05,357 DEBUG Task1:25 - start call Task2
17:48:05,526 DEBUG Task1:29 - end call, Task1 duration=286
17:48:05,526 DEBUG Task1:25 - start call Task1
17:48:05,555 DEBUG Task1:29 - end call, Task2 duration=197
17:48:05,555 DEBUG Task1:25 - start call Task2
17:48:05,600 DEBUG Task1:29 - end call, Task2 duration=44
17:48:05,600 DEBUG Task1:25 - start call Task2
17:48:05,616 DEBUG Task1:29 - end call, Task0 duration=447
17:48:05,616 DEBUG Task1:25 - start call Task0
17:48:05,650 DEBUG Task1:29 - end call, Task2 duration=49
17:48:05,652 DEBUG Task1:25 - start call Task2
17:48:05,772 DEBUG Task1:29 - end call, Task2 duration=120
17:48:05,773 DEBUG Task1:25 - start call Task2
17:48:05,775 DEBUG Task1:29 - end call, Task1 duration=248
17:48:05,775 DEBUG Task1:25 - start call Task1
17:48:05,782 DEBUG Task1:29 - end call, Task2 duration=9
17:48:05,782 DEBUG Task1:25 - start call Task2
17:48:05,865 DEBUG Task1:29 - end call, Task2 duration=83
17:48:05,866 DEBUG Task1:25 - start call Task2
17:48:05,870 DEBUG Task1:29 - end call, Task2 duration=4
17:48:05,870 DEBUG Task1:25 - start call Task2
17:48:05,878 DEBUG Task1:29 - end call, Task0 duration=261
17:48:05,879 DEBUG Task1:25 - start call Task0
17:48:06,105 DEBUG Task1:29 - end call, Task2 duration=234
17:48:06,105 DEBUG Task1:25 - start call Task2
17:48:06,239 DEBUG Task1:29 - end call, Task0 duration=360
17:48:06,239 DEBUG Task1:25 - start call Task0
17:48:06,258 DEBUG Task1:29 - end call, Task2 duration=152
17:48:06,258 DEBUG Task1:25 - start call Task2
17:48:06,260 DEBUG Task1:29 - end call, Task1 duration=485
17:48:06,260 DEBUG Task1:25 - start call Task1
17:48:06,452 DEBUG Task1:29 - end call, Task0 duration=213
17:48:06,452 DEBUG Task1:25 - start call Task0
17:48:06,586 DEBUG Task1:29 - end call, Task1 duration=326
17:48:06,586 DEBUG Task1:25 - start call Task1
17:48:06,619 DEBUG Task1:29 - end call, Task0 duration=166
17:48:06,620 DEBUG Task1:25 - start call Task0
17:48:06,672 DEBUG Task1:29 - end call, Task0 duration=52
17:48:06,672 DEBUG Task1:25 - start call Task0
17:48:06,747 DEBUG Task1:29 - end call, Task2 duration=489
17:48:06,747 DEBUG Task1:25 - start call Task2
17:48:06,792 DEBUG Task1:29 - end call, Task1 duration=205
17:48:06,792 DEBUG Task1:25 - start call Task1



* Updated 2009-11-29