or subscribe to my Knol changes with my Knol Public activity feed
| Written by Will Johnson, email me at email@example.com |
or post your comments for public view far below.
Creative Commons Attribution 3.0 License
| ||The Knol Self-Tweeter on the left, is designed to work only for an article's author.|
Click this link, to buy your own Knol Self-Tweeter for two dollars through Paypal. I will email it to you, when I receive payment. What the Knol Self-Tweeter does exactly is explained at this link.
Please rate this article using the stars at the right ---------------------------->>>>>>
| First Normal Data files || Multivalued Data files |
|Multivalued databases are those databases where any particular field in a record, may itself be a multiple part entity. That is, a list or an array, of possibly indeterminate length. Multivalued databases are thus not First Normal Form databases, but are rather called post-normal form databases. They are designed to handle situations where creating sub-tables is inefficient for data access purposes. They are examples of intrinsic JOINs, that is, without the requirement for an SQL JOIN, as the data is already present in the record being read.||First Normal data files showing a married couple Bob and Mary Jones, with an associated list linking them and including the names of their children Sam and Brenda. |
|Multivalued datafile showing the same information. |
4 Mary, Sam, Brenda
Note how field 4 can take on multiple values, from zero to any length. The values are separated from each other in this case by a comma, but the separator value may be any valid character not used in the data.
Note from the example above how the multivalued table, only requires a single record read, while the First Normal Table requires reading two records if the names of the family members are queried. No SQL JOIN is required in the multivalued table.
Multivalued databases require that we do not set a record length, and yet we need to provide for random-access to the records in each table (i.e. data file). The solution to these conditions pulling in opposite directions, is to use hashing on the record keys of the table. The hash value tells us the offset that we need to add to the pointer, which points to the start of our table, in order to jump in our random-access table to the proper sub-section where that record should be, if it exists. In simpler random-access files, the offset points directly to the primary key of that record, however this requires either that each record is a fixed-length, or that a jump table is maintained, listing the offset to every key.
In our situation, we cannot have a jump table listing every key, since the records in a multi-user system are constantly in flux, in such a way that other user processes could never know a change has occurred until their own state is out-of-sync. Since the records are each of changeable length, the jump points would also be changing rapidly and we could never propagate the changes through the system in a timely manner. So each table (i.e. data file) must be broken down into segments that can act independently of any other segment. We will call each of these segments a group. It is theoretically possible to have one group for every record in the table, however this causes an inefficient use of storage with many groups being sparsely populated or even not populated at all.
So instead of that, we will allow our groups to have multiple records, each trailed by a special reserved record separator code, so we know when we've hit the end of that record. In read mode, once we have jumped to the proper group where the desired record should be found, we will have to walk the group until we find the correct primary key, or report that it is not present. In write mode however, we could blindly jump to the end of the group and simply write our new record as trailing content.
If however, the record had already existed, and we are merely updating it not adding it, then we also have to either remove the old record from its place, or actually write the changes in place and then expand or shrink the rest of the group to take up the new slack. In multi-user systems, typically many users may simultaneously have permission to update records. In order to prevent update collisions, we have to maintain a lock table, that informs processes that a particular desired record is currently locked by another process. How will we know when we've hit the end of the group? We will need to define a special reserved end of group code as well to tell us that.
--> Forward to PHP and MMD (Page 2)