Quicktime AAC Encoder: qaac und qtaacenc

  • Kommandozeilen-Encoder für AAC mit Hilfe von QuickTime 7.6.2+

    Pro-Version ist nicht erforderlich; Apple schränkt lediglich den eigenen QuickTime-Player funktionell ein, die API ist aber komplett nutzbar

    qaac: http://sites.google.com/site/qaacpage/cabinet — die etwas umfassendere Encoder-Software, unterstützt auch Mehrkanalton

    Aktuelle Versionen: 0.99 und 1.09 (letztere auch in 64-bit, daher die doppelte Archivgröße).

    Changelog: http://sites.google.com/site/qaacpage/news

    Achtung: Der Webserver meldet *.zip-Dateien mit einem MIME-Typ für StuffIt-Archive. Beim Download kann sich ein Webbrowser daher eventuell entscheiden, die Endung in *.sit umzubenennen.

    qtaacenc: http://tmkk.pv.land.to/qtaacenc/ — etwas einfacher, unterstützt keine Cover/Lyrics, krieg's noch nicht mit EAC 1.0 beta hin (input file not found)

    Aktuelle Version: 2011-08-16

    Changelog: am Ende der Website

    Übersicht der Qualitätsstufen und Bitraten (wiederveröffentlicht von IgorC im HydrogenAudio-Beitrag zu qtaacenc):

  • qaac gibt es mittlerweile in Version 1.19; unter anderem wurden Funktionen in Bezug auf Mehrkanal-Audio verbessert.

    Versionen ab 1.00 verwenden die CoreAudioToolbox statt der QuickTime-API, sind also nicht mehr ausschließlich von der Installation von QuickTime abhängig; dafür aber müssen Bibliotheken des "Apple Application Support" vorhanden sein (u.a. verfügbar in QuickTime ab 7.7.1 oder iTunes ab 10.5; erst ab diesen Versionen wird auch die Nutzung des HE-AAC-Encoders ermöglicht).

    qtaacenc wird anscheinend weder weiterentwickelt noch weiter betreut?! Im HydrogenAudio-Beitrag gibt es keine Antworten mehr auf Fragen.

  • qaac ist bereits bei Version 1.37.

    Mittlerweile scheint es auch CD-Cue-Sheets als Eingabedatei verarbeiten zu können, es werden einzelne Titel (außer bei Option --concat-cuesheet) mit kompletten Tags ausgegeben; generierter Dateiname ist die zweistellige Tracknummer, gefolgt vom Tracktitel.

    Die Versionsangaben in QuickTimes CoreAudioToolbox.dll sind übrigens ein heilloses Durcheinander, deshalb liest qaac (Kurztest ist mit "qaac --check" möglich) nun immer die englischen Versionsangaben, die sind wohl die aktuellsten.

    In QuickTime 7.7.2 ist CoreAudioToolbox Version aktuell (DLL vom 20. Februar 2012, 4880232 Bytes; andere Sprachversionen der DLL-internen Versions-Ressourcen listen oder gar als Versionsnummer).

  • Mittlerweile haben wir QAAC 2.13; ich sammle mal, was seit 2.00 passiert ist:


    • Gracefully shutdown on console interrupt event (such as Ctrl+C, Ctrl+Break or closing console window).

      Gracefully means that it stops encoding immediately as if it were the end of input, and properly finalize the container, therefore resulting file will be playable (until that point).

      Of course, it is not that qaac can terminate gracefully in every possible situations. You can always forcefully kill qaac using task manager or something.


    • Add --no-delay option. (Read the discussion at HA thread from here).

      --no-delay will compensate encoder delay (2112 samples) by prepending silence of 960 samples before sending input to encoder, then trimming 3 AAC frames at beginning (2112 + 960 = 3072 = 1024 * 3, where 1024 is the frame length of AAC. So total amount of delay will be exactly equals to length of 3 AAC frames). Note that these numbers are doubled in case of SBR.

      This option is meant for video as a mean to resolve A/V sync issue. The resultant AAC will have exactly zero-delay, but might have pops/clicks at the beginning. Use with care.


    • Changed --tag option behavior to be strict. Formerly, when fourcc passed by --tag is unknown, qaac accepted it and wrote it as UTF8 string tag. Now --tag accepts only known tags. This is considered to be more foolproof, since iTunes is known to refuse editing tags when a file contains unknown tag atoms.

    • Read vorbis comment "WAVEFORMATEXTENSIBLE_CHANNEL_MASK" of FLAC and treat as channel layout.

    • Fixed a bug: mono AIFF/CAF file with kAudioChannelLabel_Mono in chan chunk could not be read.


    • Changed --delay option spec. --delay now accepts either duration in time or number of samples. If you are used to timespec of sox, you should be already familiar with it. The format is as following:


      In the first case, parts surrounded by brackets can be omitted. So, --delay=100 means 100 seconds, --delay=-10.72 means -10.72 seconds, --delay=02:53.1 means 2 minutes and 53.1 seconds, and so on.

      Second case is for number of samples. You just put an integer followed by "s" (means "samples"). --delay=-2112s or something.

    • HTOA support. Now index 00 of first track in cue sheet is encoded into track 0.

    • Fixed a bug of cue sheet parser. Last line of cue was ignored if the last line is ending with a white space character other than LF.


    • Fixed a regression on 2.06, which resulted in failure when non-canonical path was passed by -o option (reported by this post at HA).

    • Added --fname-from-tag option to generate output file names based on the tags of input files. You can configure output file name more precisely by additionally using --fname-format (which has been an option for cuesheet input only).


    • Now copy chapters from ALAC/m4a input (when available).

    • Delay Nero style chapter point as much as the encoder delay (2112 samples). It seems that Nero AAC encoder was previously using Nero style chapter to signal encoder delay this way, and fb2k is in honor of it.

      Note that Nero style chapter is a list of <title, start time> pairs, therefore first chapter can start at arbitrary point, while last chapter goes until the end of the track.

      On the other hand, QuickTime style chapter is a list of <title, duration> pairs, therefore first chapter always starts from the beginning of the track, while last chapter can end at arbitrary point.

      qaac will write both style chapters (for the sake of compatibility), but these two have subtle difference and incompatibility.

    • Now writes actual duration into edts. This is done mainly for QuickTime which doesn't look iTunSMPB thingy. Now QuickTime can trim zero-padding and decode sample accurately (whole song / each sub-chapters).

      Technically, there's no way to tell the value of encoder delay to QuickTime player. It just silently assumes implicit AAC delay of 2112 samples, and automatically crops that amount from the beginning --- it just works with qaac because qaac is using their encoder. edts is used here in order to just let them trim the trailing zero paddings.

    Fixes for 2.00 regression again.

    • WAV parser was ignoring data chunk length even if --ignorelength was not specified.

    • Bogus total length was printed on libsndfile input due to int64_t -> int8_t typo.


    • Fixed a bug: when opening non-supported input file, there was a chance that ridiculously too much memory gets allocated and OS hangs (refalac only). This is regression on 2.00, but is basically coming from a weakness of libmp4v2 which can allocate HUGE memory when mp4 box structure is corrupt.

    • Rewritten 24bit PCM bit packing/unpacking code.

      qaac -D 24bit.wav -o - >NUL

      is about 3 times faster than before.


    • Sorry, 2.04 fix was flawed. Re-fixed it.

      BTW, The problem on 2.00 was usually quite audible. If you are anxious about it, the apparent evidence of the bug is less number of samples compared to the original.

      If you were using simply 16/32bit 2ch input, you might not have met any troubles so far (like me). In this case, sample size (in bytes) is power of 2, and probably there's less possibility of partial read breaking in the middle of sample boundary.


    • Fixed broken pipe input (regression on 2.00).

      When feeding from pipe, there was always a chance that output from some arbitrary point become white noise like.

      This was due to switch to lower level I/O routine on 2.00, which can result in "partial read" in case of pipe input. When it is still aligned to sample size boundary, it does no harm. However, when it is not aligned, the succeeding samples get completely out of sync, and result in white noise or something.

      The possibility of this problem depends on how sender pushes audio to pipe, and sample size (16bit, 24bit, etc). I didn't notice it until today, but I could reproduce this using cat command as feeder.


    • Fixed box layout of iTunes custom metadata (long tag). It was written as name-> mean -> data (should be mean -> name -> data).

      This was a long standing bug, and I am somewhat surprised that no one has ever reported me of this. This should fix the interoperability problem with TagLib.


    • Fixed refalac to properly finalize the MP4 container: regression on 2.00.


    • Fixed a regression on 2.00: --threading was broken.

    This is an experimental (might be unstable) release with many updates, so version was bumped up to 2.00.

    • Enabled MP3 decoding.

    • --concat + --adts now accepts multiple inputs with different sample format. Explained later.

    • Removed --concat-cuesheet, since it's mostly similar to --concat.

    • Added --no-dither, which turns off automatic dither on quantization.

    • -b now accepts arbitrary value in 2-32 range. -b32 for WAV output means float format. All other cases are integer.

    • -N(--normalize) now doesn't use temporary file if the input is seekable.

    • FLAC file with ID3v2 tag is now accepted (ID3 tag is just skipped and ignored).

    • Fix crash on reading TAK file with binary tag.

    • Improve ID3v2 tag handling.

    • Many refactoring of source code has been done.

    Multiple format stream generated by --concat and --adts

    Since this requires complete reset of the encoder, zero padding is added at the stream change point.

    As far as I know, almost no software player on PC can continue to play such file after the stream format change. In my environment, Windows Media Player 12 is the only exception I know of.

    Downloads: https://sites.google.com/site/qaacpage/cabinet
    News: https://sites.google.com/site/qaacpage/news

  • 2.15:

    • Fixed an awful bug of refalac of 2.xx branch. It wasn't encoding in correct frame length (4096 samples) on some cases.
      I noticed it when I encoded directly from lossyFLAC (not piped input), which resulted 512 samples-per-frame ALAC file. It seemed playable, but apparently is not a normal/sane ALAC file;
      WAV input will be fine (including piped input). Direct input from FLAC or other formats might be affected, and Re-encoding is recommended. Only refalac of 2.xx branch is affected. qaac is fine.
      Use more strict sharing mode when opening files. Now qaac/refalac doesn't allow other processes to open the output file when qaac/refalac is writing to it. Reading can be shared, but now qaac/refalac cannot open a file for reading when another process is writing to it.


    • Add --cue-track option to limit tracks to extract from cuesheet, and fixed several minor bugs.
  • 2.16:

    • Read and handle multichannel layout of TAK files.
    • Write fact chunk when decoding into WAVEFORMATEXTENSIBLE format. As far as I can see, even WMP is not in honor of the fact chunk, so this would be pretty much useless. However, since it looks like RIFF/WAV spec require it in WAVEFORMATEXTENSIBLE, this was implemented to be more spec compliant. fact chunk is not written on piped output or WAVEFORMATEX format.
    • Automatically kill progress message when stderr is connected to nothing.
  • 2.17:

    • Fixed github issue 27 (regression in 2.16); --decode was writing invalid wav file.
    • Added --gapless-mode option (same as fdkaac).

    Zitat von nu774

    Interestingly, iTunes seems to support both of iTunSMPB and ISO standard gapless mode. QuickTime supports only the latter. In the past, I thought QT silently assumes 2112 samples of delay. However, it turned out that QT actually looks elst media_time when sbgp and sgpd are present, so it can be used generally (as described in QTFF spec).

    As far as I know, iTunes is the only music player that supports gapless playback in both way.

  • 2.18:

    • Added --sort-args option by request, which sorts filenames given by command line before processing them. Sorting is done simply by Unicode character code, in ascending order.
    • Fixed to test seekability before estimating input length of raw input. Seems like _filelengti64() doesn't return error on a named pipe.
  • 2.19


    Fixed: attempt to set one of stik, rtng, akID, sfID tags caused qaac hang.

    Well, actually not hanging but waiting for console input in vain... due to a silly bug calling scanf() instead of sscanf().

    Fixed: --tag akID:fra was writing USA country code (not France).

  • 2.25

    Leicht geändert für den Link


    • Fix crash on reading unsigned 8bit PCM through libsndfile (for example, Wave64 format).
    • Fix bogus (non-compliant) sgpd box written on gaplessmode 1 or 2. However, I don't still recommend using it. As far as I know, only iTunes is known to support it well. VLC also supports edts, but it seems VLC decodes first few frames of HE-AAC without SBR when edts is being used.
    • Support float16 and float24 WAV and Wavpack file. float16 is assumed to be normalize in range [-65536, 65536], which is different from normal [-1,1] for floating point PCM. For details, read this thread on HA
    • Show PCM sample format (int8 or something) when --verbose is specified. Both input format and resulting format are shown, the latter might be different due to DSP chain.
    • Disabled automatic quantization to integer when sample format is converted to float by DSP chain and encoding to ALAC.
    • Repackaged 64bit libsoxr.dll as libsoxr64.dll. Now refalac64 supports both names (of course it cannot use 32bit version of DLL, so be careful).
    • Show more meaningful message on write error (MSVCRT assigns EINVAL for broken pipe error, resulting in "invalid parameter" message, which is not quite helpful).
    • Some code clean up.



    • Switched from libsoxrate to libsoxr and new libsoxconvolver. Like libsoxr, libsoxconvolver uses SIMD optimized DFT/convolution routine when SSE is available. This library is used for --lowpass, --matrix-preset and --matrix-file. Unlike libsoxrate, libsoxconvolver is 32bit float based.
    • Add --peak and --play option.
      Both doesn't produce output to a file, and cannot be used with other encoding option such as -V, -v, -a, -c, -A, and -D. However, DSP options such as --rate or --lowpass can be used.
      --peak just scans input and print peak. Might be useful when you apply some DSP (especially mixing), and want to know resulting peak value before encoding.
      --play does what it's name implies (play files using Wave Mapper device). Since qaac is an encoder and not a music player, don't expect much from it. It's just intended for cases when you want to test new custom matrix coefficients or something. --play doesn't automatically convert sample format, nor does remix.
    • Changed random number generator (used for TPDF dither) to LCG, which is known to be poor in randomness but quite fast, and is enough for just generating white noise for dither.
    • Don't flush immediately after writing WAV header when writing WAV file to a pipe. This makes pipe rewinding hack of SoX happier, but it seems not perfect. Basically speaking, SoX's pipe rewinding on win32 is nothing but a hack, so don't expect automatic format detecting of SoX to always work. Just use -t wav - or something to avoid unnecessary pipe rewinding.
    • Some code cleanup.
    • Fix help messages.
    • Updated taglib (again).



    • Fixed not to write tag when value of tag is empty.
    • Support loading of libFLAC_dynamic.dll (this name is used by v1.3.0 DLL distributed at www.rarewares.org). Currently, qaac searches libFLAC dll in the following order.
      libFLAC_dynamic.dll -> libFLAC.dll -> libFLAC-8.dll
      Since 1.3.0 and 1.2.1 DLL are binary compatible within range of use by qaac, you can use any of them.
    • Tags given by command line option now take precedence over default tool tag written by qaac. As a result, you can override tool tag if you want to (with --tag too:value).
    • Updated TagLib to github current HEAD.




    Fixed an issue of --threading option. There was a possibility of non sample aligned read on the pipe, similar to the problem that was fixed on 2.04 and 2.05.



  • 2.36


    2.36 includes some minor fixes:

    Improved accuracy of seeking on MP3 files by increased amount of preroll. Still doesn't count how many frames required due to bit-reservoir, but prerolling of 9 frames should be enough...
    Fixed bitrate formatting on --format. Has been printing in decimals for 3ch only.
    Fixed --stat. Incorrect values were written at the beginning (regression introduced by --num-priming or something).
    Updated taglib.


    2.34 build fix


    2.34 was built targeting Vista or later by accident (toolset v120 was used instead of v120_xp). If you are using Windows XP SP3, please download qaac_2.34.1.zip.


    Update on qaac_2.33.zip again (v3)


    Broken incremental build seems to have also affected refalac.exe (although it doesn't crash). Updated qaac_2.33.zip again (shown as v3).



    Implemented smart padding (same as fdkaac) that minimizes the possibility of gapless playback issue.
    You can disable this feature by new option --no-smart-padding. However, --no-smart-padding also disables additional padding at the end of HE-AAC stream that has been implemented as a workaround for CoreAudio encoder bug.
    Although I don't recommend using --no-smart-padding, it is mandatory when you want bit-identical bitstream output as iTunes (including it's bugs).
    Fixed fallback sample rate conversion when libsoxr is not present (was not working exactly as intended).
    Improved error messages for the attempt to encode non supported PCM format to ALAC.
    Minor fixes and rewriting.



    Fixed: --tag apID and --tag akID were written in the long tag format.



    Switch to MSVC12 (Visual Studio 2013). Now you need msvc[r|p]120.dll instead ( included in the zip archive). For the time being you can still able to build qaac with MSVC10, but I may drop MSVC10 support in future in favor of improved C++11 support in MSVC12.
    Fixed crash when unknown fourcc was passed to --tag option.
    Support --tag purd (purchase date), which seems to have been missing recently (perhaps around 2.28 or so).
    Don't write raw ID3v1 genre number to CAF.
    Don't write iTunes specific metadata to CAF.
    Other minor code rewrites regarding metadata handling.



    Fixed regression on 2.29, that was causing miserable failure (ERROR: invalid vector<T> subscript) when reading TAK files



    Fixed regression on 2.28: Tags were not properly copied when --concat was specified on cuesheet input.
    Fixed not to exit with failure requesting output filename when --concat was specified with --peak or --play (in which case "output filename" is nonsense).
    Some minor improvement and code refactoring.



    Add new option: --caf. As the name implies, --caf tells qaac to output to CAF container. (HE-)AAC, ALAC, PCM (-D) are supported. Pipe streaming is supported in case of PCM, that can be used to pass audio as well as tags to fdkaac through pipeline.
    (Hopefully) better handling of metadata. Non-standard tags such as performer or ISRC are now copied from input (However, some tags such as replaygain related metadata, ripping log, and cuesheet are blacklisted and not copied).
    Support ALAC in CAF input from libsndfile. This will only be used by refalac + very recent libsndfile. qaac has already been supporting input of ALAC in CAF through CoreAudio API.
    Fix: take care of zero byte text file input, which resulted in MLang text encoding detection failure.



    Now you can set --drc option twice or more, with different parameters. This can be used to obtain more complex effect.

    For example, you can use --drc for normal compression, then as a limiter (--drc with high thresh + high ratio + zero attack/release will work something like a limiter, that effectively kills remaining peaks


  • 2.37


    Fixed a bug: AAC in CAF generated by qaac --caf was not playable due to bogus kuki chunk (Descriptors inside of esds box are expected, but qaac was writing bare AudioSpecificConfig).

  • [qaac] release 2.38 (refalac 1.38)


    Allow nesting of ${} in --fname-format.
    Now you can write something like ${albumartist|${artist}} for example, which means:

    When album artist tag is present and non-empty, it evaluates to album artist tag's value.
    Otherwise, it evaluates to artist tag's value.

  • [qaac] release 2.39 (refalac 1.39)


    Added dedicated MP4 reader, which supports:

    Decoding AAC-LC, MP1/2/3, ALAC (qaac), ALAC (refalac).
    Perfectly support files with iTunSMPB and multiple edits. In files with multiple edits, there exist multiple valid spans to be played back. In other words, there are multiple gaps to be skipped. As far as I know, there is NO such software that supports it properly. If you are interested, try Multiple-edits.zip.

    This file contains 3 edits in it. When properly decoded/played, it contains exactly 30 seconds music. However, in this file there are very short 2 gaps to be skipped in the middle as well as first delay and end padding. Therefore, no software other than qaac should play it correctly without pops/clicks exactly in 30 seconds.

    [qaac] release 2.40 (refalac 1.40)


    Fixed new MP4 decoder introduced at 2.39 (found seek related bug, and ALAC has been trimmed too much).

  • [qaac] release 2.41 (refalac 1.41)


    Add --limiter. --limiter applies smart limiter that softly clips portions where peak exceeds (near) 0dBFS. Softly means that it applies non-linear filter to surrounding half cycles (nearest zero crossing point to zero crossing point) so that the result fits in under 0dBFS but still is smoothly connected to other parts, resulting in much smaller audible distortion than dumb hard clips.

    For CVBR/ABR/CBR mode, bitrate value less than 8 is now treated as "bits per sample". Bitrate is computed as the following:
    Bitrate = bits_per_sample * number_of_channels * sample_rate
    For example, --cvbr 2 is now equivalent for --cvbr 192 (=2*2*48000) for 2ch, 48kHz case. This can be useful when you want to use CVBR/ABR/CBR and want constant quality setting for varying number of channels or sample rate.

    Other minor changes.

  • [qaac] release 2.42 (refalac 1.42)


    Add --start and --end option to specify start, end point of the input for partial encode. --start, --end (and --delay) supports 3 ways to describe the point.
    [[hh:]mm:]ss[.sss..] : Timestamp described in hours, minutes, and seconds. Parts enclosed by brackets can be omitted. Seconds are parsed as double precision number (64bit float), and you can place arbitrary numbers of digits under the decimal point. You will need enough digits to achieve sample accuracy, depending on the sample rate.
    ns : Number of samples, followed by 's'.
    mm:ss:fff : Cuepoint in minutes, seconds, and frames(1/75 second), followed by 'f'.
    Re-linked 32bit libsoxr.dll not to depend on libgcc_s_sjlj-1.dll. Now it is not included in the archive and you don't need it anymore