Update: The code for this notebook is available on Github.

Aligning video recordings with Julia

9 μs

Due to the restrictions imposed by COVID-19, dance teachers around the world are taking their classes online. As a dance student, it would be helpful to watch yourself against a recording of your teacher.

Something like this:

16.5 μs
170 μs

It is very unlikely the song in both videos starts at exactly the same. Manually syncing them is not easy.

On the other hand, computers can compare the sound waves of each video file and quickly determine how long to wait before playing one video or the other so they are both aligned.

18 μs

Sound waves

12.5 μs

Sound waves are represented in digital audio by very rapidly sampling the distortions of the medium (e.g. air) through a microphone and storing the resulting data in a vector.

10.1 μs

Jazmine, the teacher, sent Julia a recording of a jazz song (cc0), sampled 16,000 times per second:

10.9 μs
32.4 s
x
13.1 ms
x_duration_seconds
7.00875
19.5 μs
(SampleBuf{Float32,2}, 16000.0, 112140)
17 μs

Each sample in x corresponds to a microphone reading. The first 5 samples at 1s are:

29.1 μs
13.8 μs

Plotting the audio wave:

9.9 μs
351 ms

Julia, then danced against the same song and recorded herself near the sea (cc-by-nc):

40 μs
x2
8 ms
x2_duration_seconds
9.52
14.5 μs
(SampleBuf{Float32,2}, 16000.0, 152320)
15.2 μs

Let's look at Julia's audio wave:

8.1 μs