Using PHP to help parse any file (Page 2)

The internal structure of a data file can be important for various reasons. The primary one is perhaps, in order to build extensions based upon, or with a deep understanding of the structure. If you do not know the structure, you cannot extract the data in an open manner. You cannot modify the data, except through approved channels (i.e. vendor's applications), and you cannot create data-centered extensions.


<-- Back to Using PHP to help parse any file (Page 1)
--> Forward to Using PHP to help parse any file (Page 3)

In our first lesson, we read a block of data from our target datafile into a variable $contents, now we'll process that block one character at a time.  But how do we get just one character from the block?

In PHP, bytes in an array can be addressed directly, using our previously initialized variable $chr_pos, like this

$this_chr = $contents[$chr_pos];

We're also going to want the decimal and the hex value of the character we're looking at.  We can determine those with this set of commands

$this_dec = ord($this_chr);
$this_hex = dechex($this_dec);

Now we want to append to the end of our current $hexline, the hex value, but one small trick here.  We're going to be outputting this information on a fixed width line, we want to make certain that each character is output in a field of two, so that all of our columns line up neatly, on top of each other, so we need to pad any hex character which is only a single digit (i.e. '0' through 'f').  We can append the padded output with this command.

$hexline .= ".".str_pad(this_hex,2,"0",STR_PAD_LEFT);

I'm using the initial "." merely as a space holder so the results don't run together needlessly.  It's cleaner to my eye to see them separated a bit.  You can use a space instead of a period if you want.  STR_PAD_LEFT is a setting that says pad on the left, you could also pad on the right I suppose, but it's a rare request.  The "0" is what character to pad with, in some situations you might want a space.  The 2 is how big the field is in total, in this case we want our result to be exactly two characters wide.

Now we want to process the detail that will go on the $textline.  This will be a little more complicated because not only does HTML treat "spaces" as noise, but I also somehow want to indicate low-level hex (under Hex 32) from high-level hex (over Hex 126).  I don't know why I want to do it, I'm very quirky.  It probably stems from my days when low-level hex was only used for system-command type operations.  So I'm going to use a PHP switch command, which if you've done programming before, you may recognize as a type of CASE.

switch($this_dec) {
      case ($this_dec < 32):
        $textline .= "&nbsp;x."; break;
      case ($this_dec == 32):
        $textline .= "&nbsp;&nbsp;&nbsp;"; break;
      case ($this_dec > 126):
        $textline .= "&nbsp;.y"; break;
        $textline .= "&nbsp;&nbsp;".$this_chr; break;

What the heck?!  Okay calm down.

    If the decimal value is low (under 32) we can't actually display it, it will just look like gobbly-gook.  So I'm going to show a space and then "x."
    If the value is exactly 32, this is the space character itself, so in that case, we're going to show three non-breaking spaces to HTML, so it will actually pad it out properly in the display.
    If the value is over 126, I'm going to show a space and then ".y"
    If it's anything else (i.e. a "normal" printable character), this will fall through to the "default" case, and in that case I will show two spaces and then just the character itself.

    Each case is terminated with a "break" command which seems pretty redundant doesn't it?  But hey I didn't write PHP.

<-- Back to Using PHP to help parse any file (Page 1)
--> Forward to Using PHP to help parse any file (Page 3)