HD, so like… a high-definition floppy?
It’s the day after Independence Day in the US, and much of our staff is just returning to their preferred work machines. If this was 1997 instead of 2018, that would mean booting up BeOS for some. The future-of-operating-systems-that-never-was arrived just over 20 years ago, so in light of the holiday, we’re resurfacing this geek’s guide. The piece originally ran on June 2, 2010; it appears unchanged below.
The Be operating system file system, known simply as BFS, is the file system for the Haiku, BeOS, and SkyOS operating systems. When it was created in the late ’90s as part of the ill-fated BeOS project, BFS’s ahead-of-its-time feature set immediately struck the fancy OS geeks. That feature set includes:
- A 64-bit address space
- Use of journaling
- Highly multithreaded reading
- Support of database-like extended file attributes
- Optimization for streaming file access
A dozen years later, the legendary BFS still merits exploration—so we’re diving in today, starting with some filesystem basics and moving on to a discussion of the above features. We also chatted with two people intimately familiar with the OS: the person who developed BFS for Be and the developer behind the open-source version of BFS.
A little history
BFS was created in 1997 by Dominic Giampaolo and Cyril Meurillon, both of whom worked at Be. It was designed to be multi-threaded and lightweight, and to support high-volume, streaming multimedia. It was also designed to support the database features of the previous Be file system. Even though it was written at a time when systems typically had only 8MB of RAM and a mere 9GB of disk storage, many of the forward-thinking design decisions made then are still valid today.
BFS didn’t quite end when Be shut its doors after failing to get bought by Apple. In 2002, Axel Dörfler re-implemented BFS for Haiku as an open-source project. The last part of this article features an interview with Axel.
Before we can talk about what made BFS so special, we first have to cover some file system basics.
File system basics
At the basic level, a file system exists to manage the data on permanent storage devices. Functions common to most file systems include:
- Creating files and directories
- Opening, reading, writing, deleting, and renaming files
- Reading, writing, and updating file metadata or attributes
Additional features include symbolic links, access control lists, and memory mapping.
For a high-level overview of numerous file systems, refer to the article From BFS to ZFS: Past, Present and Future File Systems. For a more in-depth look at HPFS, NTFS, EXT2, and XFS circa 2000, refer to chapter 3 of Practical File System Design.
The following terms are common in file system discussions, so you should glance through the list and familiarize yourself with any you don’t already know:
Disk: A permanent storage medium of a certain size. A disk also has a sector or block size, which is the minimum unit that the disk can read or write. The traditional block size for disks has been 512 bytes, but it now approaches 4096 bytes for newer disks.
Block: The smallest unit writable by a disk or file system. Everything a file system does is composed of operations done on blocks. A file system block is always the same size as or larger (in integer multiples) than the disk block size.
Bitmap: not a picture, but a data structure that determines which blocks on a disk are free or used.
Partition: A subset of all the blocks on a disk. A disk can have several partitions.
Volume: The name given to a collection of blocks on some storage medium (i.e., a disk). That is, a volume may be all of the blocks on a single disk, some portion of the total number of blocks on a disk, or it may even span multiple disks and be all the blocks on several disks. The term “volume” is used to refer to a disk or partition that has been initialized with a file system.
Superblock: The area of a volume where a file system stores its critical, volume-wide information. A superblock usually contains information such as how large a volume is, the name of a volume, and so on
Metadata: A general term referring to information that is about something but not directly part of it. For example, the size of a file is very important information, but it is not part of the data that’s stored in the file itself. So file metadata is data about a file, and not in a file.
Journaling: A method of insuring the correctness of file system metadata even in the presence of power failures or unexpected reboots.
I-node: The place where a file system stores all the necessary metadata about a file. The i-node also provides the connection to the contents of the file and any other data associated with the file. The term “i-node” (which we will use in this article) is historical and originated in Unix. An i-node is also known as a file control block (FCB) or file record.
Attribute: A name (as a text string) and value associated with the name. The value may have a defined type (string, integer, etc.), or it may just be arbitrary data. (1)
BFS features
Now that we’ve got our file system basics down, let’s look at some of the features that make BFS unique.
First off, BFS’s 64-bit addressing means that no matter how large disks get in the future, you will be able to format the entire disk with BFS. You can create partitions in excess of 8 exabytes and, depending on the block size used, you can create files that are greater than 30 GB in size.
One of BFS’s most important and widely touted features is its support for extended attributes. An example of the importance of attributes is illustrated with an example of MP3 files. Information fields important to an MP3 file would be: song title, band, album, release date, encoding rate, length, number of times played. If you want to associate this information with each MP3 file using a conventional file system, you might have to create your own database to support searching, creating, updating, or deleting these attributes as your music collection grows and changes. With BFS, in contrast, these attributes, or any other attributes, can be added to the file system itself. This means that a program for editing or playing MP3s does not need to create or maintain a database, because the file system will handle these functions for you. BFS supports associating attributes with a file, either under program control or from the command line. Attributes can be searched and sorted by the file system, as an extension of any application. How this is done will be discussed in detail later.
BFS supports the ability to create a persistent or ‘live’ query that watches for file changes. This is a query that hooks into the file system, checking for files that match search criteria. Under Haiku, these queries are easy to create and surprisingly light on system resources.
BFS is journaled, which means that it keeps track of certain file system consistencies on-the-go, and does not need file system consistency tools like fsck or chkdsk. Journaling also contributes to faster system booting after an unexpected shutdown.
Internally, BFS uses UTF-8 characters for directory and file names. This means you can use virtually any language, natively, in Haiku. With no extra effort you can localize file names to Chinese, German characters with umlauts, or cursive Arabic.
BFS gives special performance consideration to large file access. Creating and reading large video, audio, or image files are optimized operations under BFS.
BFS architecture: the Superblock
The superblock is typically the highest-level data structure for a file system. The BFS superblock describes the physical disk, the journal area, and indexing. For obvious performance reasons, the superblock is kept in RAM after the system boots.
typedef struct disk_super_block {
char name[B_OS_NAME_LENGTH];
int32 magic1;
int32 fs_byte_order;
uint32 block_size;
uint32 block_shift;
off_t num_blocks;
off_t used_blocks;
int32 inode_size;
int32 magic2;
int32 blocks_per_ag;
int32 ag_shift;
int32 num_ags;
int32 flags;
block_run log_blocks;
off_t log_start;
off_t log_end;
int32 magic3;
inode_addr root_dir;
inode_addr indices;
int32 pad[8];
} disk_super_block;
The name struct holds the file system name. Three magic numbers are used for consistency checking as well as version numbering. Fs_byte_order holds the byte ordering, and block_size holds the explicit byte count; block_shift, used as an exponent of 2, will also calculate the block size. This is a purposeful redundancy used for consistency checking of a file system. Num_blocks holds the number of available blocks for the file system, and used_blocks holds the number currently in use. The flags field determines whether the state of the superblock is clean or dirty. Root_dir points to the root of all files and directories. Indices points to the beginning of the index section of extended attributes. The bfsinfo utility can be used to dump the superblock for a system’s superblock.
Be the first to comment