GMFBridge, InfTee and Copies

The GMFBridge tool shows how to divide DirectShow tasks into a number of separate graphs. This allows you to make changes while the graph is running, such as switching between different source files or input devices, and changing output files without stopping the graph. GMFBridge shows that you can use multiple graphs without imposing much overhead, since it does not introduce any thread switches or copies of the data.

This efficiency comes at the price of added complexity. Avoiding deadlocks can be quite complicated when you have a single thread calling across graphs that are in different states. In this article, I want to look at one aspect of that complexity: the difficulties of copying video frames.

Inftee and Smart Tee

One of the most common uses for GMFBridge is during video capture, to allow control over the multiplexor separately from the capture filter. This allows you to start and stop capture and change capture files without interrupting the preview display. To do this, you need to send the frames both to the preview renderer and to the multiplexor. This is most commonly done with either the Infinite Pin Tee or the Smart Tee filter.

The inftee filter sends the same buffer, with the same IMediaSample object, to all of its output pins. The buffer is returned to the allocator’s pool only when the downstream filters on all of the output pins have finished with it. Since the IMediaSample object is the same, all the output pins get the same timestamp and changes to the metadata affect all the filters.

The smart tee filter is designed specifically for use in video capture. In a video capture graph, frames are stamped with the time that they are captured; if these frames are delivered to a renderer, they will be late, since that time has already passed. The smart tee solves this by simply stripping off the timestamps, so the frames will be rendered immediately. It does this by wrapping another IMediaSample object around the same buffer. The original buffer, with timestamps intact, is sent via the capture pin to the multiplexor. The other IMediaSample object has no timestamps, but refers to the same physical buffer, and has a refcount on the original. The original buffer will not be returned to the allocator’s pool until both downstream filters have finished with it.

The smart tee has another feature that is less popular. The preview pin drops frames if it considers that the recording is falling behind, and the mechanism used to decide this is fairly primitive, and based on the behaviour of capture graphs in typical systems of the mid-90s. For this reason, inftee is often used instead.

Timestamp Correction

GMFBridge connects two separate filter graphs. These graphs will have separate time bases, possibly using separate clocks and almost certainly using a different stream time offset. So when samples are transferred between graphs, the timestamps need to be adjusted to fit the new graph’s time base. (As an aside, there are several different ways to adjust the timestamps, which is the subject of a separate note to be published shortly).

The bridge adjusts the timestamps simply by setting the new timestamp on the sample. This means that any other filter using the same IMediaSample object (on another inftee output pin) will see the timestamp correction. Of course, in the original graph, this timestamp correction can cause significant problems in lipsync or playback delays.

Multiplexor Queues

When a buffer is sent to multiple output pins by the inftee or smart tee filters, it is not returned to the free pool until all of the downstream filters have finished with it. Some multiplexors require a long buffer queue in order to interleave audio and video in the correct ratio, and this can be a problem for other output pins using the same buffers. Typically, preview rendering will freeze since all the buffers are queued at the multiplexor, and in some cases, this will have a knock-on effect preventing audio delivery, which in turn prevents the multiplexor advancing, and thus the whole graph will deadlock.

Copying Frames

The simplest solution to these problems is to copy the frames if you are using a tee filter with GMFBridge. Since the data is usually in the form of uncompressed video data, this can often be done simply by introducing a Colour Converter filter into the graph between the inftee and the bridge, but any transform filter (other than in-place transforms) will do. This will ensure that the timestamp modifications do not get carried over to other output pins, and will also make sure that the multiplexor is using buffers from a different allocator, which should prevent the long multiplexor queue from affecting preview rendering.

The timestamp modification issue alone could also be prevented by using a different IMediaSample object wrapping the same physical buffer memory. That is, instead of copying the whole buffer, a new IMediaSample can be allocated but pointing to the same buffer (and holding a refcount on the original sample object). It’s possible that a future release of GMFBridge will incorporate this feature.

Looser Coupling

As I mentioned earlier, GMFBridge was designed to show that you can divide tasks into multiple graphs without any loss of efficiency. To achieve that, it keeps the graphs very tightly coupled, without thread switches or buffer copies.

There are many cases where this is not the best route. For example, if you need to send the video frames to multiple outputs, you may need to copy the samples as I have discussed above, and as a result you will get the complexity of using a single thread in multiple graphs without most of the efficiency gains.

In these cases, a more loosely coupled approach would be more appropriate. For example, the bridge sink filters can place the frames into a pool of buffers, from which multiple render or recording graphs can copy the frames. This allows the benefits of a multiple graph architecture, together with one-to-many delivery of frames, and avoids many of the complications of gmfbridge. The cost is a copy of the frame for each output, and a separate thread for each output.

I’ve developed solutions like this for clients and found that the greater flexibility and simplicity easily outweighs the downside of a few memory copies.