The Media Kit

The Media Kit provides powerful support for all forms of media (including, but not limited to, audio and video), including both playback and recording from and to a wide variety of media and devices.

There are two levels of Media Kit programming: the high level, where an application accesses the Media Kit to play and record sound and video, and the low level, which involves actually creating the nodes that manipulate media data.


Architecture of the Media Kit

This section is a general overview of the architecture of the Media Kit.

The first thing that you need to understand as a media programmer working with the BeOS is the concept of a node. Generically speaking, a node is a specialized object in the media system that performs some media-related task. All nodes are indirectly derived from the BMediaNode class (but never directly from BMediaNode).

Nodes can be loaded from add-on modules, or they can be created within an application itself. See the BMediaAddOn class for details.

The BMediaRoster class provides the application interface to all available nodes (whether they're created by the application or by an add-on). An application instantiates the nodes it needs, then establishes the connections between them that will accomplish the desired task.

For example, if an application wants to play a sound file through a graphic equalizer, it might first instantiate a node that reads sound data from a disk file and outputs audio buffers, then instantiate a node that performs filtering on audio buffers, and finally, a node that plays sound buffers to the speakers.

Once these three nodes are instantiated, the application creates the links between them. The output of the audio file reading node is connected to the input of the equalizer node, then the equalizer node's output is connected to the sound player node's input.

Once these connections are established, the application can then begin playing the sound file, by telling the first node what sound file to play, and then starting all of the nodes running. A more detailed example of how to work with nodes to play back media data is given in the BMediaRoster class.

No portion of the media node protocol is optional; if you don't follow all the rules precisely, you risk hurting media performance, and since BeOS is the Media OS, that would be a bad thing.

For more detailed information about the architecture of the Media Kit (in particular, how nodes relate to one another), please see the BMediaNode class.


Types of Nodes

There are several basic kinds of nodes. Each of them is derived originally from the BMediaNode class. Any nodes that you might implement will be derived, in turn, from one or more of these node types.

Producers

A producer (a node derived from the BBufferProducer class) outputs media buffers, which are then received by consumers. A producer might be generating nodes on its own (for example, a tone generator might be using a mathematical formula to generate a sound, or an audio file player might be loading data from disk and sending buffers containing its audio data). Other producers might be responsible for acquiring data from media hardware (such as a video camera) and passing the media buffers to consumers down the line.

Consumers

Consumers (nodes derived from BBufferConsumer), receive buffers from a producer and process them in some manner. For example, a sound card's software would provide a consumer node that receives audio buffers and plays them through the card's hardware.

Consumer/Producers (Filters)

A consumer/producer (a node that derives from both BBufferConsumer and BBufferProducer) is also called a filter. A filter accepts buffers (like a consumer), processes them in some manner, then sends them back out again (like a producer). This can be used to alter sound or video data.

For example, an audio filter might add a reverb effect to sound buffers, or a video filter might add captioning to a video stream.

Controllable Nodes

If a node wishes to provide the user options for configuring its functionality, the node can derive from BControllable. This provides features for creating a network of controllable parameters, and for publishing this information to Media Kit-savvy applications (including the Media preference applications).

Time Sources

A time source node broadcasts timing information that can be used by other nodes. All nodes are slaved to a time source, which provides synchronization among all nodes slaved to that time source. Typically, applications won't need to worry about this, because any node created through the BMediaRoster class are automatically slaved to the system (default) time source.

A node can be derived from any of these types (BBufferProducer, BBufferConsumer, BBufferTimeSource, and BControllable), in any combination, as appropriate.

For example, if you're creating a sound card nod that plays audio to stereo speakers, you would need to derive from BBufferConsumer (in order to receive audio buffers). You could also derive from BTimeSource if your sound card has the ability to provide timing information to others. And if you want the user to be able to control the volume, balance, and so forth, you would also derive from BControllable.

If your sound card also provides a digitizer input, you would actually create a second node to support that feature. It would inherit from BBufferProducer (so it can generate audio buffers for other nodes to use). It might also derive from BTimeSource and BControllable.

But not all nodes necessarily represent a physical hardware device. If you want to create a filter--for example, a noise-reduction filter--you can create a node to do this too. Simply derive from both BBufferConsumer (so you can receive buffers) and BBufferProducer (so you can send back out the altered buffers).


Source & Destination vs. Output & Input

Beginning Media Kit programmers may have trouble understanding the difference between a media_source and a media_output, and a media_destination and a media_input.

The media_source and media_destination structures describe a "socket" of sorts (much like in networking). These are the ends of the connection, much like the jacks you might plug cables into to connect various components of a stereo system. They're relatively small, lightweight descriptions containing only the information needed during real-time manipulation of nodes.

The media_output and media_input structures describe an actual connection between a media_source and a media_destination, including the source and destination, the connetion's name, and the format of the data the connection is intended to handle. These are larger, and contain additional information needed when presenting a user interface describing the connections between nodes.

Although media_output and media_input contain all the information of the media_source and media_destination structures, the latter structures exist because when you're doing real-time manipulation of media data, you don't want to be tossing large blocks of data around unless you have to. And you don't have to.


Using the Media Kit

If you're writing an application that wants to record or play back some form of media data (such as a sound or a video file), all your media needs are served by the BMediaRoster class. This class provides access to the various nodes, and lets you establish the relationships among them that are necessary to perform the tasks you'd like to accomplish.

The BMediaNode is an abstract class; you don't call its functions directly. Instead, you use BMediaRoster calls to issue requests to the various nodes available on the BeOS system on which your application is running. In addition, you can't derive a new class directly from BMediaNode; instead, derive from one of the system-defined subclasses (BBufferConsumer, BBufferProducer, BControllable, and so forth).

Media Kit error code constants can be found in MediaDefs.h.


The Audio Mixer

The audio mixer accepts as input audio data which it then mixes and outputs to the audio output device or devices the user has selected in the Audio preference application. Your application can get a media_node referencing the audio mixer using the BMediaRoster::GetAudioMixer() function. You can't intercept audio being output by the audio mixer; they go directly to the output device.

Buffers containing any standard raw audio format can be sent to the audio mixer; the mixer will convert the data into the appropriate format for playback.

The audio mixer is always running, and is slaved to the most appropriate time source. You should never change its time source or start or stop the audio mixer (in other words, don't call the BMediaRoster calls SetTimeSourceFor(), Start(), or Stop() on the audio mixer).


The Audio Input

The audio input creates audio buffers from external sources, such as microphones or line in ports. The physical hardware device from which the sound is input is configured by the user using the Audio preference application.

In the current implementation of the Media Kit, the audio input doesn't let you change the sampling rate. This may change in the future. To ensure that your application will continue to work in the future, don't assume that the current sampling rate will remain in effect; instead, you should look at the media_format structure in the media_output you're using for your connection to the audio input:

<<<insert sample code here>>>

The audio input is exclusive: only one connection to it is allowed at a time. If you need to receive buffers from the input from two consumers, you'll need to create a special node that receives audio buffers, then sends copies of them to all the consumers that are attached to it.


Audio Playback Made Easy

If all you want to do is play back raw audio (such as AIFF or WAVE files), the Media Kit provides the BSound and BSoundPlayer classes to simplify this process. BSound represents audio in memory or on disk, and BSoundPlayer hides the inner workings of the Media Kit from you to make your life simple. See these two classes for more information; an example on how to play audio files is given in the BSoundPlayer class overview.


Media Data Formats

This chapter doesn't pretend to be a tutorial on the intricacies of audio and video data formats. There are plenty of good reference books on these subjects. Here's a list of books our engineers suggest:


Creating New Node Classes

You can create your own nodes to perform different types of media processing. Nodes can be provided in add-ons that the Media Kit can load dormant nodes from, or in the application itself.

<<<discussion of the actual process of implementing a node needs to be added, including an example>>>


Creating a Media Add-on

This is discussed in detail in the BMediaAddOn class overview.


Application-based Nodes

You can create your own node subclasses in an application if your application has special needs; just derive from the appropriate base class (such as BBufferConsumer) as normal. Note, however, that your application should never directly call any of your subclass' functions; instead, you should register the node with the media roster, and control it via BMediaRoster calls, just like any other node, by using the media_node that describes your node.


Timing Issues

When dealing with a number of nodes cooperating in processing data, there are always important timing concerns. This section covers how various types of nodes need to behave in order to maintain proper timing.

Calculating Buffer Start Times

To calculate the presentation time at which a buffer should be performed, you should keep track of how many frames have been played, then multiply that value by 1000000LL/sample_rate (and, if your calculation is being done using floating-point math, you should floor() the result). You can then apply whatever offset you want to Seek() to.

   buf->Header()->size_used = your_buf_frames * your_frame_size;
   buf->Header->start_time = your_total_frames*1000000LL/your_format.frame_rate;
   your_total_frames += your_buf_frames;

You shouldn't compute the start time by adding the previous buffer's duration to its start time; the accumulation of rounding errors over time will cause dropped samples about three times per second if you do.

Producers

Producers that produce buffers intended for output need to stamp each buffer it creates with a startTime, which indicates the performance time at which the buffer should be played. If the producer is playing media from a file, or synchronizing sound, this is the time at which the media should become analog.

In order to compute this startTime properly, the producer must prepare the buffers in advance, by the amount of time reported by BBufferProducer::FindLatencyFor(). The producer also needs to respond to the BBufferProducer::LateNoticeReceived() hook function by at least updating the time stamps it's putting on the buffers it's sending out, so they'll be played by the downstream nodes, which should be checking those times to play them at the correct time (and may be dropping buffers if they're late). If this isn't done, things will tend to get further and further behind.

In general, it's best to try to produce buffers as late as possible without actually causing them to arrive at their destination late (ie, they should be sent at or before the time presentationTime - downstreamLatency). This will ensure the best overall performance by reducing the number of buffers that are pending (especially if the user starts playing with time such that your node gets seeked or stopped). Also, if you're producing buffers that have a real-world connection, such as to a video display, producing them too early might cause them to be displayed early.

If a producer is producing buffers that are being generated by a physical input (such as a microphone jack, for example) handle things somewhat differently. They stamp the generated buffers with the performance time at which they were captured (this should be the time at which the first sample in the buffer was taken). This means that when these buffers are transmitted downstream, they'll always be "late" in the eyes of any node they arrive at.

This also means you can't easily hook a physical input to a physical output, because buffers will always arrive at the output later than the timestamped value. You need to insert another node between the two to adjust the time stamps appropriately so they won't be "late" anymore.

Additionally, nodes that record data (such as file-writing nodes), in the B_RECORDING run mode, shouldn't care about buffers that arrive late; this lets data be recorded without concern for this issue.

Consumers

If the consumer is the device that recognizes the media (ie, it plays the audio or video contained in the buffers it receives), it needs to report the correct latency back to the producer for the time it takes buffers to reach the analog world (ie, the amount of time it takes to present the data in the buffer to the user, whether it's audio or video). Buffers that are received shouldn't be played until the startTime stamped on the buffers arrives. If buffers arrive late, the consumer should send a late notice to the producer, so it can make the necessary adjustments, and not pass the buffer along at all; be sure to Recycle() the buffers so they can be reused.

Consumer/Producers (Filters)

A consumer/producer (filter) must report the correct latency for the time a buffer takes to pass through the filter from the time it's received to the time it's retransmitted, plus the downstream latency. It shouldn't change the time stamp, unless this explicitly part of the filter's purpose. The filter should also handle late packets as described under Producers and Consumers above.

Media Applications

The application that starts the nodes and the time source to which they're slaved needs to provide them with the correct starting times. For example, if several nodes have been connected, they've all been slaved to an appropriate time source, and you want to start them all up, you need to take the following steps:

      bigtime_t latency;
      Roster->GetLatencyFor(node1, &latency);
      
      Roster->PrerollNode(node1);
      Roster->PrerollNode(node2);
      Roster->PrerollNode(node3);
      Roster->StartNode(node1, 0);
      Roster->StartNode(node2, 0);
      Roster->StartNode(node3, 0);
      
      bigtime_t now = system_time();
      Roster->SeekNode(timesourceNode, -latency, now + 10000);
      Roster->StartNode(timesourceNode, now + 10000);

The extra 10,000 microseconds is added in case the code gets preempted while preparing to start the timesourceNode; this gives us a little fudge factor so we don't start out behind.


Installing Media Nodes and Drivers

Media node add-ons should be installed in the /boot/home/config/add-ons/media directory.

Media drivers should be installed in /boot/home/config/add-ons/kernel/drivers/bin. Then create a simlink to the driver in /boot/home/config/add-ons/kernel/drivers/dev/(type), where (type) is the type of driver you're installing (audio, video, etc).


About enum Members of Classes

The Media Kit has several classes (most notably, BMediaNode) that contain, as members, enums. For instance, in BMediaNode, you'll find the following:

   class BMediaNode {
      ...
      enum run_mode {
         B_OFFLINE = 1,
         B_DECREASE_PRECISION,
         B_INCREASE_LATENCY,
         B_DROP_DATA,
         B_RECORDING
      };
      ...
   };

In this case, you can freely use B_OFFLINE and so forth from within objects derived from BMediaNode, but if you want to use these values from other classes (or outside any class), you need to use the notation BMediaNode::B_OFFLINE to use these constants. This is true of any enum defined within a class; this will be called-out specifically in the descriptions of any constants in this chapter.


About Multiple Virtual Inheritance

Virtual inheritance is slightly different from regular inheritance in C++. The constructor for the virtual base class has to be explicitly (or implicitly) called from the most-derived class being instantiated, rather than being called from the direct descendant class actually defining the virtual inheritance.

In simple terms, this means that whenever you derive a new class from a class that uses virtual inheritance, your derived class's constructor should explicitly call the parent class's constructor.

This call-it-yourself paradigm is also followed by certain other functions (this will be called out specifically in the documentation for those functions, as appropriate), such as the HandleMessage() function in all node classes; when you implement a node, you need to be sure to explicitly call each inherited version of HandleMessage(), as shown below:

   class MyBufferConsumerProducer :
            public BBufferConsumer,
            public BBufferProducer {
   ...
   };
   
   virtual status_t MyBufferProducerConsumer::HandleMessage(int32 message,
            const void *data, size_t size) {
      if (code == SOME_THING_I_DO) {
         DoWhatever();
      }
      else if (BBufferConsumer::HandleMessage(message, data, size) &&
               BBufferProducer::HandleMessage(message, data, size) &&
               BMediaNode::HandleMessage(message, data, size)) {
         BMediaNode::HandleBadMessage(message, data, size);
      }
   }

For a BMediaNode to be properly created, you have to explicitly call the BMediaNode constructor with the name for your derived class:

   MyBufferConsumerProducer::MyBufferConsumerProducer(const char *name) :
      BMediaNode(name),
      BBufferConsumer(),
      BBufferProducer() {
      /* constructor stuff goes here */
   }

Again, this prevents the BMediaNode constructor from being called more than once. This is especially important since only your most-derived class (in this case, MyBufferConsumerProducer) has all the necessary information to properly call BMediaNode's constructor (that information being, in this example, the node's name).




The Be Book, in lovely HTML, for BeOS Release 4.

Copyright © 1998 Be, Inc. All rights reserved.

Last modified December 11, 1998.