How does image copy detection work?
The goal of image copy detection is to identify whether two images originate from the same source or not. This is different from other detection issues which deal with original images in that these are copies of one image that are being manipulated in different ways (e.g., blurred, rotated, scaled, edited, etc.) to “fool” detection.
Images are represented digitally as a collection of pixels. A pixel, the smallest unit of an image, is defined by a numerical value corresponding to the color or intensity of that portion of the picture. Applying transformations to an image—such as rotation, color filters—produces copies that our visual system easily perceives as similar to the original. However, digital representation changes dramatically, making it much harder for computer systems to identify similar images.
Figure 1: source https://cs231n.github.io
As re-uploaded content is often modified, exact copy detection comparing original pixel values to all other uploaded images misses too many images which are similar to the human eye. This is why most image detection models use algorithms to map patterns from the original pixel values to representations (called embeddings), allowing the capture of re-uploaded content that has been modified.
Experimental setup
Tremau researchers have run a number of experiments to test the accuracy of different copy detection methods. Using data from Facebook, the team deployed a Local Detector Based Model SIFT model and three Deep Learning Based Models (GVRL, SSCD, and DENA).
The results indicate significant gaps between the various models tested, confirming the importance for online platforms to appropriately choose the detection tools that best suit their needs and types of data they host. Ideally, platforms should also have the ability to work with multiple choices of copy detection methods.
As new methods are also continuously developed, it is critical that companies continuously assess – and possibly replace – the tools they use. While this may be time-consuming, improvements in the accuracy of the tools can significantly dwarf such costs over time.
How can Tremau help?
It is crucial for all online companies to be aware of image manipulation methods and take action to prevent multiple posts of variations of known illegal content. If you would like more information about how to secure your platform, streamline your moderation processes or are unsure which detection tool is right for you contact us.