Wednesday, January 25, 2006

diffpatch: An Alternative to the Unix Programs

First, some background:

"The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS)."

My own diffpatch program in REALbasic (written about two years ago) does not use "well-studied methods for computing the longest common subsequence (LCS)," but it can be "used to find the differences between text files and to apply the differences" and it does "operate on a line by line basis."

Here's a description of "How diff and patch Work":

"The diff program compares two versions of a document, generating a set of differences that reflect the changes that need to be applied to the old document to make it identical to the new document.... The set of differences can be transported to someone who has the original copy of the document. By running the patch program, the document contents can be updated to the new version...."

The output of a diff program may have a variety of formats, but the three most common are normal, context, and unified. I wrote a REALbasic program that can handle (not the creating of, but) the applying of at least two of those three UNIX-style formats. That is, I wrote a program that is a patch program in RB: it requires a UNIX-style program to create the file indicating the differences, but once that file has been created, it can apply that file to the original document to update the contents of that document.

I wrote these two programs primarily as an RB novice because (rightly or wrongly) I understood someone to be telling me that I couldn't do it. He was correct in that I knew nothing about the "well-studied methods for computing the longest common subsequence (LCS)," but even though I didn't know enough to be able to write a really efficient diff program, it was simple to write a patch program to update documents using those UNIX-style difference files and fun writing my own diffpatch program with my own format.

How does my format (used in diffpatch) differ from the UNIX files? The significant thing (if there is any) is that my format does not include any lines from the original document. What this means is that if the original document is copyrighted by someone else, then I can publicly post my own updates and modifications without getting into trouble for violation of copyright. (Text files that can be updated, for example, include HTML files - such as are found in the Language Reference - and XML files - such as the format in which RB Projects can be saved.)

I haven't tracked down my patch program yet (the one that handles at least two of the UNIX formats), but I have made available my diffpatch program (look for it in the usual place). It is rather limited in its practical usefulness, but it was an interesting program to write, and you may find it to be of interest also, even if only as a "curiosity piece."

Barry Traver

Home Page for This Blog:

Programs and Files Discussed in the Blog:


Post a Comment

<< Home