A comparison of image histograms
Histograms are one way to describe the information content of an image. They help us to understand the patterns in the image through observing the distribution of the values in each channel – read, green and blue. Changing an image is possible through changing its histogram; changing the histogram is possible through visually editing the image. Histograms are so frequently used in design that they are part of every modern image editor. Lightroom allows you to drag the histogram curve and dynamically observe the changes in the image which is very intuitive. Clamping the histogram at both ends has long been used as a way to enhance the contrast of an image. The idea is that areas with near-zero occurences of certain values can be removed from the histogram as they are unlikely to contribute much to the overall look of the image. This effectively takes the most significant values and spreads them across the interval [0, 255].
Distributions are a useful tool in statistics, because they allow us to describe various phenomena. The Gaussian/normal distribution has been found to be applicable in many cases that exhibit a growth and decay pattern. The Zipf distribution describes the frequencies of terms in a document and that only few of them are very common, where the majority aren't. The power law distribution describes the “long tail”. The Weibull distribution can describe when a device may fail. The Poisson distribution describes the number of independent events occuring in a particular interval of time or space. The chi-square distribution can describe “the goodness of fit”. The student's t-distribution describes samples drawn from various populations and determines the statistical significance of the difference in their means. Although not a distribution, the ROC curve describes whether a classifier is doing better than random guessing. As you can see distributions are everywhere and they can be seen as an important data analysis tool.
The histogram shows how the pixel values in an image are spread. Each pixel has a red, green and blue value associated with it. If we take all the green value components and examine which of the values from 0 to 255 (x axis) occurs most frequently, then we get an intuitive feeling how high the histogram bar should be at that particular bin (y axis). The same is valud for the other two components. This allows us to observe patterns that are hard to define, especially when we don't have previous points of reference. This is why it is important to study histograms, not episodically, but in context and in large numbers. Interesting question we might have are: “How does this small object affect the homogeneity of the image when it differs a lot from the environment?”, “What will happen if we introduce a lot of noise?”, “How important is detail-richness to the overall perception? How will this detail accumulate and where?“, „Should a human with an animal face still be recognized as a human?“, „How easy is it to detect occluded objects in a scene and what kind of data does this remove from view?“ A prior knowledge of such patterns can help to automatically detect the type of new images (flower, orange, house...) or find similar ones based merely on looks, not on a user's query.
I have decided to take the Oxford Buildings Dataset (5k images, 1.8GB) and see whether the histograms can reveal interesting things about images. I have preselected slightly more than 300 images to make this manageable. Then I plotted their histograms and created thumbnails from the originals, to create the image-histogram connections in a way that could make them browsable. The original file names were preserved to make it easy for you to see the same image in the original dataset if you wish so (although relabeling would have slightly reduced the page size). I hope that this collection will help you to understand histograms better, so you can use them more effectively in your own designs. I would also like to thank to everyone involved in making these beautiful photos freely available.