January 2022
Sergei Milkhailovich Prokudin-Gorskii (1863-1944) was a Russian chemist and photographer, best known for his cutting-edge methods of color photography. On trips sponsored by Tsar Nicholas II, Prokudin-Gorskii documented, with thousands of images, the Russian empire between the years 1905 and 1915. The resulting collection was purchased by the US Library of Congress in 1948.
Prokudin-Gorskii's method involved capturing three separate negatives, each with a different filter, either red, blue, or green. This set of negatives can be combined to produce a full color image.
Images can be downloaded from the collection in the form shown; there are three sections correspond to blue, green, and red filters respectively from top to bottom.
My task is to align these three color channels. Simply dividing the image in three equal parts and overlaying them results in a misaligned, fuzzy mess.
Clearly, we need to align these channels. We'll search for the correct allignment by leaving the blue channel alone and trying to match the red and green channels on top of it. A candidate alignment for the red or green channel is just a translation some number of pixels in each direction, horizontal and vertical. We need a metric for comparing various candidate alignments.
That is, if r and b are two dimensional arrays of brightness values, each for a different color channel,
I propose that
I'll compute these differences for both vertical neighbors and horiztonal neighbors, adding both to the sum, as shown in the expression above. Here is a Python implementation of the described algorithm.
Here's what happens when we try all translations of distance at most 5% of the image's longer side. The metric described above is minimized by the following alignment.
As described so far, this alignment algorithm is quite slow even for small images (~300px). For larger images (~3000px), it runs for minutes before I give up and move on. To make it faster, let's try a coarse to fine search. An image scaled down to small size requires examining many fewer candidate alignments since each pixel is a larger proportion of the image. Now, we try scaling images down to just 60px side length on their shorter side, performing the search, and then inching the image back towards the full scale. At each step in between, we only bother searching within the pixels corresponding to the previous solution plus or minus a bit. Here is a Python implementation of the course to fine search. As seen, I played around with some parameters to minimize how much of the search space we consider at each step, landing on values that worked quickly but accurately on all the images in the dataset. Stepping up the scaling factor by 3/4 each time seemed based on limited testing to give a nice balance between search space size and number of searches.
Now we can start to align higher resolution images in seconds instead of minutes.
At this point, the algorithm can align all the images in the assignment dataset in under a minute. It is interesting to think about what the "best" alignment really is. What metric is best? One can imagine the images being taken on a half pixel offset from each other such that there are two, equally correct alignments. Images like the one shown made me wonder whether the best alignment could ever be agreed on. As you can see, a difference in one pixel, on one channel, for a sufficiently high resolution image gives very similar results.
That said, I am happy with my metric, and with any tweaking that I do by hand, I can't seem to get any more satisfied by an alignment than I do with the algorithm's alignment. I do look forward to seeing classmates' alignment metrics.
Up until this point, I've been interested in making the algorithm faster. I've tweaked the parameters a bit (how coarse we go and how quicly we go back to the full image), and I've found that small patches of the original image are always sufficient for finding the alignment. Out of curiosity, I at this point plugged in the old basic image alignment metric, the sum of differences squared. I noticed that while that metric gets the correct alignment for a full patch size, it would not perform as consistently when patch size is reduced. This was a nice bit of evidence that my neighboring pixel differences metric works well, sometimes better than the naive version.
Metric, Patch Size | Images Aligned |
---|---|
Sum of Differences Squared, 100% | 17/17 |
Neighboring Pixel Differences, 100% | 17/17 |
Sum of Differences Squared, 20% | 15/17 |
Neighboring Pixel Differences, 20% | 17/17 |
Frankly I don't want to do too much with these images. I find the way they look to be compelling on their own (including the funny edges) because it all says something about the history. I do recognize at least that the funky edges are in part an artifact of my using the "numpy.roll()" function. For this reason, I concluded by removing the border in each dimension only as far as the channels were translated. That is, I kept as much of the correctly aligned image as possible, which importantly does leave some funny looking edges from the glass panes still. I like them - quite a bit.
I enjoyed this project. It is rewarding to see the images improve as you work, and it was fun to experiment with various alignment metrics. These are some pretty interesting historic images, and it was great to produce my own version of them.
[1] Assignment from Professor Pless
[2] Wikipedia entry on Prokudin-Gorskii