Blog

Posted 2006/05/12

A swan dive back into coding

Picking up the Tycho swarm file distribution code after a run of hacking at other people's things because I decided it was time to do my own work.

I have spent the last few days picking up from where I left off to before the R-GMA and MDS benchmarking. Although the code was tested while being developed, now everything seems to be working we can properly investigate where performance bottlenecks are (optimize later rather than early). I use the classic printf() method to examine when ’slow’ things are happening during the flow of the program. I have pasted a section of debug output below as an example, the four fields show:

  1. When the function was called,
  2. The time in seconds since the last debug statement,
  3. Whether it was the client or the mediator which called the function (for some functions which are used by both the mediator and client is difficult to distinguish this in the debug code so ‘???’ is used and the caller can be derived by the programmer from the context of the line by looking at adjacent debug statements,
  4. The name of the called function.


03:52:27.877 0.0 MED HttpSend.send()
03:52:28.156 0.2 MED HttpThread.handle()
03:52:28.227 0.0 MED BufferConsumer.consume()
03:52:28.419 0.1 ??? SocketSend.send()
03:52:28.421 0.0 CLI SocketThread.readSocket()
03:52:28.481 0.0 MED HttpThread.handle()

The technique is to look at this output and then try to burrow into sections which seem to be slowest. I work through the code in order of slowest section until I think the effort I am expending is greater than the gain in performance. There were several small clean ups, but I focused on finding sections which were taking more than 0.5 seconds to run (biggest performance impact). I can really see where Rich’s Slogger program would help ? I have up to 5 of these debug outputs to fit in my brain per test run.

The function dealing with incoming chunks was sped up by 0.88 seconds by changing the way the message was read so the payload was not copied twice.

I found a performance bug in the Tycho Message class which effects the functions changeFrom and changeTo, they were very inefficient and took up to 0.5 seconds to run with large messages. After rewriting them I reduced their cost by around 0.4 seconds.

Altering these three functions increased the speed of transferring the test 3.5MB file by slightly more than 3 full seconds down to 3.96 seconds.

I am now happy with how the system appears to perform and am ready to try some proper benchmarks. After this piece of work is complete I think I am ready to be full time on writing up.