Using PHP to help parse any file (Page 3)

The internal structure of a data file can be important for various reasons. The primary one is perhaps, in order to build extensions based upon, or with a deep understanding of the structure. If you do not know the structure, you cannot extract the data in an open manner. You cannot modify the data, except through approved channels (i.e. vendor's applications), and you cannot create data-centered extensions.

Authors

<-- Back to Using PHP to help parse any file (Page 2)


In our first two lessons, on this project, we read a block of data from our data file, and processed a character out of the block.  Now we need to increment our character position pointer and we do that with this command

++$chr_pos;

This means "Add 1 to $chr_pos".  Weird isn't it?  A relic from C.  Someone should shoot that guy.  Anyway.

The end of the loop looks like this

} while (! $done);

But I've deliberately skipped over what we do when we've collected a "line" full of output, and also what we do if we reach the end of our current block of data (4K characters) and yet we're not at the end of the file.  So right now we've just created an infinite loop which will process the 4K characters but never output anything and never stop.

So let's handle that.

Since we're outputting a field three-characters wide for each character processed, let's break our output line, every 32 characters.  That should give us an output line of 96 width.  I'm also going to display, for edification, the byte starting and ending position.  But how do we say "every 32" ?  There is a simple command in PHP using the modulus function %.  Modulus returns the remainder after division.  This is the remainder, not the dividend! It is always a positive integer, ranging from zero to the modulus number.  In this case our modulus is going to be 32, so this number will range from 0 to 31.  Using an old programmer's trick we can test whether the remainder returned is zero in this way.

if (! ($chr_pos % 32)) {

This says "if the remainder, after dividing $chr_pos by 32 is zero then...
The reason we can use the "Not" test here, is because "zero" is equivalent to "false".
You could have also said

if (($chr_post % 32) == 0) {

but hey programmer's like to be obtuse at times I suppose.  For job security.

But now what do we do if we have collected 32 characters in our output line?

First let's figure out what bytes these are

      $byte_end = $loop*4096 + $chr_pos; $byte_start = $byte_end - 31;

$loop is how many times we've had to read our file, so far this is zero, but we'll see how that changes next.  When we get 32 characters, we're at the end of our line, so we need to subtract 31 to give us where we started.  Now we output that

      echo "bytes ".$byte_start."-".$byte_end."<br>";

And then we output our built up lines

      echo $hexline."<br>"; $hexline = '';
      echo $textline."<br><br>"; $textline = '';
}

Notice also that after we print them, we blank out our line-output buffers.

Now the only way that we're done with the whole file is if our character position pointer has fallien off the end of the file.  We test that with this command, which says "Is our current character position greater than the length of what we read (our contents)?"

if ($chr_pos > $len_contents)
    $done = true;

Otherwise, if we just happen to be at the end of our block of data (remember we read 4K characters at a time) then we need to read the next block, get its length, reset our chr position pointer, and increment our loop pointer!  Whew!  Okay we do this by this set of commands

    if ($chr_pos == 4096) {
      $contents = fread($handle,4096); $len_contents = strlen($contents);
      $chr_pos = 0; ++$loop; }

And that's it!  You now have a fully functioning data file parser.  When you run it, it outputs to your web browser page, but you can save its result set into a text file and play with it as much as you like until you've figured out the data file structure!


If you can't put the pieces together properly, I will sell you the PHP script intact in one file, for two bucks.  Just email me at wjhonson@aol.com