Todo

-- For other people
- Multithread vp8 or vc1.
- Multithread an intra codec like mjpeg.
- Fix mpeg1 (see below).
- Try the first three items under Optimization.
- Fix h264 (see below).
- Try mpeg4 (see below).

-- Bug fixes

General critical:
- 'make test' fails.
There seems to be some problem related to draw_edges/the mpeg4 encoder.
- 'make fate' fails to due unknown H264 bugs.
This works on mainline_patches, so diffing h264*.c and reverting hunks
until it works should find the problem.
- Error resilience has to run before ff_report_frame_progress()
is called. Otherwise there will be race conditions. (This might already
work.) In general testing error paths should be done more.

mpeg*:
- ARM asm depends on specific offsets into MpegEncContext
which are different here.

h264:
- Files split at the wrong NAL unit don't (and can't)
be decoded with threads (e.g. TS split so PPS is after
the frame, PAFF with two fields in a packet). Scan the
packet at the start of decode and don't finish setup
until all PPS/SPS have been encountered.

mpeg4:
- Packed B-frames need to be explicitly split up
when frame threading is on. It's not very fast
without this.
- The buffer age optimization is disabled due to
the way buffers are allocated across threads. The
branch 'fix_buffer_age' has an attempt to fix it
which breaks ffplay.
- Support interlaced.

mpeg1/2:
- Seeking always prints "first frame not a keyframe"
with threads on. Currently disabled for this reason.

-- Prove correct

- decode_update_progress() in h264.c
h264_race_checking branch has some work on h264,
but not that function. It might be worth putting
the branch under #ifdef DEBUG in mainline, but
the code would have to be cleaner.
- MPV_lowest_referenced_row() and co in mpegvideo.c
- Same in vp3.

-- Optimization

- EMU_EDGE is always set for h264 PAFF+MT
because draw_edges() writes into the other field's
thread's pixels.
- Check update_thread_context() functions and make
sure they only copy what they need to.
- Try some more optimization of the "ref < 48; ref++"
loop in h264.c await_references(), try turning the list0/list1 check
above into a loop without being slower.
- Support frame+slice threading at the same time
by assigning slice_count threads for frame threads
to use with execute(). This is simpler but unbalanced
if only one frame thread uses any.

-- Features

- Support streams with width/height changing. This
requires flushing all current frames (and buffering
the input in the meantime), closing the codec and
reopening it. Or don't support it.
- Support encoding. Might need more threading primitives
for good ratecontrol; would be nice for audio and libavfilter too.
- After merging to mainline, deprecate avcodec_thread_init
and just set thread_count.
- Async decoding part 1: instead of trying to
start every thread at the beginning, return a picture
if the earliest thread is already done, but don't wait
for it. Not sure what effect this would have.
- Part 2: have an API that doesn't wait for the decoding
thread, only returns EAGAIN if it's not ready. What will
it do with the next input packet if it returns that?
- Have an API that returns finished pictures but doesn't
require sending new ones. Maybe allow NULL avpkt when
not at the end of the stream.

-- Samples

http://astrange.ithinksw.net/ffmpeg/mt-samples/

See yuvcmp.c in this directory to compare decoded samples.

For debugging, try commenting out ff_thread_finish_setup calls so
that only one thread runs at once, and then binary search+
scatter printfs to look for differences in codec contexts.
