Thursday, January 26, 2006

Learning Better Programming from diffpatch114?

I wrote diffpatch114 about two years ago, when I was a complete novice in REALbasic. Although the program in one sense seems to work flawlessly, in another sense it is a good example of bad programming, an illustration of what to avoid (and hence, I hope, a step toward better programming).

diffpatch has two parts: "The diff [part] compares two versions of a document, generating a set of differences that reflect the changes that need to be applied to the old document to make it identical to the new document.... By running the patch [part], the document contents can be updated to the new version...."

The major problem with diffpatch114 is two-fold: (1) it takes a l-o-n-g time for the program to process long files, and (2) the user is given no visual clue that the program is still at work. It is possible at this point that the user (at least in Windows) can "break" the program by clicking and reclicking the File menu to try to determine if the program is "alive." Rather than doing that, such a process itself may cause the program to "lock up" and "fail to respond" (again, at least in Windows).

Let's talk about the second problem first. My CodeHelper solves such a problem by using three techniques to let the user know that the program is "working" on a task: a progress bar is ... well, making visible progress, the mouse cursor is changed to indicate that the computer is busy, and the word "working" is displayed in colorful big bold print. Of these, the progress bar is the most important. So a progress bar should be added to diffpatch114.

The first problem may involve some exploration. In order to speed up the process, I need to find out what part of the process is taking up the most time or an unnecessarily long time. One way I can do that is insert MsgBoxes at appropriate points in the process (MsgBox "1", MsgBox "2", MsgBox "3", etc.) to see what steps seem to take unduly long. For example, if there's a long time between MsgBox "4" and MsgBox "5", I then know what portion of my code needs to be examined (viz., the code between MsgBox "4" and MsgBox "5").

When placing the MsgBoxes, of course, it is ordinarily best not to place them within any loops (Do/Loop, For/Next, etc.). MsgBoxes are how I learned with RB 5.5.x that - although Split worked seemingly instantaneously, Join could take a long time for large arrays (i.e., arrays with tens of thousands of elements in them). (I haven't tested RB 2005 or RB 2006 to see whether Join has been improved, but a while back on one of the RB mail lists Walter Purvis gave me an excellent replacement for "Join" which is called "MyJoin" but works the same way, but only faster.

I just used diffpatch with two file versions, each of which was approximately 20,000 lines long, so I know that the diff part works and I know that the patch part works, because I was able to create a differences file that I was able to apply to the original file to (re-)create the revised version of the file. (I used the "Compare Files" feature of Edit Pad Pro to compare the two files, and Edit Pad Pro told me that they were "identical.")

Anyway, I hope to make an improved version of diffpatch available in the near future. In the meanime, use the bad programming in diffpatch114 as an example of what to avoid and as a step toward better programming. Keep your user informed as to what is taking place (using a progress bar, etc., where appropriate), and don't be content to just get your code working dependably: optimize it!

Barry Traver

Home Page for This Blog:

Programs and Files Discussed in the Blog:


Post a Comment

<< Home