The Unofficial DVD-Audio Specification

AOB (Audio OBject) Files

AOB files are MPEG Program streams (similar to VOB files) containing a Private Stream with LPCM audio. However, unlike VOB files, there are no system packets containing navigation information, and an AOB file just contains a single audio stream.

The following listing is the output of (a slightly modified) mpeg2desc (part of the dvdauthor package) on an AOB file.

00000000: pack hdr, 0.000 sec (scr=0 scrext=0, scr*300+scrext=0)
0000000e: system header; length=12
00000020: pes private1, lpcm 0; length=2010; hdr=11; pts 0.006 sec (585); (pext); pstd=10240
00000800: pack hdr, 0.001 sec (scr=146 scrext=85, scr*300+scrext=43885)
0000080e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.018 sec (1646)
00001000: pack hdr, 0.003 sec (scr=292 scrext=171, scr*300+scrext=87771)
0000100e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.029 sec (2625)
00001800: pack hdr, 0.004 sec (scr=438 scrext=257, scr*300+scrext=131657)
0000180e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.040 sec (3687)
00002000: pack hdr, 0.017 sec (scr=1597 scrext=73, scr*300+scrext=479173)
0000200e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.051 sec (4666)
00002800: pack hdr, 0.029 sec (scr=2617 scrext=195, scr*300+scrext=785295)
0000280e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.063 sec (5727)
00003000: pack hdr, 0.040 sec (scr=3638 scrext=18, scr*300+scrext=1091418)
0000300e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.074 sec (6707)
00003800: pack hdr, 0.051 sec (scr=4658 scrext=140, scr*300+scrext=1397540)
0000380e: pes private1, lpcm 0; length=2028; hdr=8; pts 0.086 sec (7768)
00004000: pack hdr, 0.063 sec (scr=5678 scrext=263, scr*300+scrext=1703663)

The actual data looks like this:

0000000: 0000 01ba 4400 0400 0401 0189 c3f8 0000  ....D...........
0000010: 01bb 000c 80c4 e104 a07f b8c0 40bd e00a  ............@...
0000020: 0000 01bd 07da 8181 0821 0001 0493 1e60  .........!.....`
0000030: 0aa0 0000 0b00 0a10 0f8f 0001 8000 0000  ................
0000040: 0000 0000 ffff ffff 0000 0000 ffff ffff  ................
0000050: ffff ffff 0000 0000 0000 0000 ffff ffff  ................

In this example, the actual PCM data (stored as big-endian signed 16-bit samples) starts at offset 0x40 into the file.

In the above example, the SCR starts at zero, and the PTS starts from 585. These values are reset at the start of each title contained in the titleset

MPEG Syntax

An Audio Object (AOB) is an MPEG Program Stream (as defined by ISO-13818-1 with a pack size of 2048 bytes (the size of a DVD sector). Each sector contains a Pack Header, followed by a PES Packet of type 0xbd (Private Stream 1) containing the LPCM data. If the audio format is identical then multiple tracks can be stored in a single audio object. Alternatively, multiple AOBs can be stored in the same ATS_XX_Y.AOB file.

The LPCM data is stored as contiguous "frames" of LPCM data. There is an LPCM header at the start of the PES payload, followed by a block of LPCM frames of a constant size (see below). The frame size varies between 160 and 2000 bytes depending on the samplerate and sample size.

Each pack has a System Clock Reference (SCR) value stored in the pack header, and a Presentation Timestamp (PTS) value stored in the PES header. An AOB file containing an MLP stream also has a Decoder Timestamp (DTS) value in the PES header which is very slightly (about 1ms) ahead of the PTS.

The PTS value in a PES header refers to the PTS of the start of the first complete audio frame in that packet (and hence doesn't always increase in equal steps), whereas the SCR refers to the pack itself and does increment in equal steps.

System Clock Reference (SCR)

The System Clock Reference (SCR) is defined in ISO-13818-1 and refers to the time that data enters a Program Stream decoder. This is related to, but different from, the Presentation Timestamp (PTS) and Decoder Timestamp (DTS).

[Help needed documenting SCR]. In example streams, the SCR initially increments as follows:

Pack # 44.1KHz 48KHz 96KHz
0 0 0 0
1 43885 43885 43885
2 87771 87771 87771
3 131657 131657 131657
4 479173 454500 245250

The SCR is then incremented based on the amount of audio data in each pack. If the packs contain 2000 bytes, then this contains (2000/192000) seconds of a 48KHz 16-bit stereo stream. This represents (2000/192000)*300*90000 SCR ticks = 281250 (exactly). Not all audio formats represent an exact number of SCR ticks, so the values are rounded, but rounding errors do not accumulate in the file.

LPCM Header

The audio frame size for LPCM packets in a VOB file is always 150 PTS ticks. For LPCM packets in an AOB file, the audio frame size for a stereo stream is as follows:

  Size in Bytes Size in
16-bit 20-bit 24-bit
44.1KHz or 48KHz 160 200 240 40
88.2KHz or 96KHz 320 400 480 80
176.4KHz or 192KHz 640 800 960 160

The LPCM header is as follows:

Offset Size (bytes) Description
01Sub-stream ID. Unconfirmed, but possibly 0xa0 for LPCM and 0xa1 for MLP
11Continuity Counter - counts from 0x00 to 0x1f and then wraps to 0x00.
The following applies to LPCM packets (not MLP)
42Byte pointer to start of first audio frame.
61Unknown - e.g. 0x10 for stereo, 0x00 for surround
71Sample size (high nibble - Channel Group 1, low nibble - Channel Group 2).
0x0=16-bit, 0x1=20-bit, 0x2=24-bit, 0xf=Channel Group not used.
81Samplerate (high nibble - Channel Group 1, low nibble - Channel Group 2).
0x0=48KHz, 0x1=96KHz, 0x2=192KHz,0x8=44.1KHz, 0x9=88.2KHz, 0xa=176.4KHz, 0xf=Channel Group not used.
e.g. 0x0f=48KHz Stereo, 0x88=44.1KHz Surround
91Unknown - e.g. 0x00
101Channel Group Assignment (see below).
111Unknown - e.g. 0x80
12varPadding - zero

Channel Group Assignments

The DVD-Audio specification supports splitting a multi-channel track into two groups of channels, where the channels assigned to group 2 have a lower samplerate/bit-depth than the channels in group 1. The valid assignments are as follows:

ID Chan. 0 Chan. 1 Chan. 2 Chan. 3 Chan. 4 Chan. 5
0 C  
1 L R  
2 L R S  
3 L R Ls Rs  
4 L R Lfe  
5 L R Lfe S  
6 L R Lfe Ls Rs  
7 L R C  
8 L R C S  
9 L R C Ls Rs  
10 L R C Lfe  
11 L R C Lfe S  
12 L R C Lfe Ls Rs
13 L R C S  
14 L R C Ls Rs  
15 L R C Lfe  
16 L R C Lfe S  
17 L R C Lfe Ls Rs
18 L R Ls Rs Lfe  
19 L R Ls Rs C  
20 L R Ls Rs C Lfe
  Group 1 Group 2

where L=Front Left, R=Front Right, C=Centre, Ls=Left Surround, Rs=Right Surround, S=Surround and Lfe=Low frequency effects