Lead Forensics

Image Copy Detection: A Key Problem in Online Trust & Safety

Image copy detection involves determining whether a target image is a copy or an altered version of another image in a dataset (the reference images, possibly a large collection of images, for example, those that have been previously detected and removed from a platform).

The report answers pressing questions such as:

Image Copy Detection: A Key Problem in Online Trust & Safety

Online platforms largely rely on content moderation to remove illegal or harmful content, such as terrorist content or child abuse images and videos. More often then not, detected and removed illegal content re-appear, possibly multiple times, as manipulated copies – for example, cropped, rotated, or edited images, or those with added watermarks.

In 2019, online platforms were tested when a mosque in Christchurch was attacked on livestream. While Facebook removed the original video, there were instantly hundreds of thousands of versions of the video being re-uploaded to Facebook, Twitter, and YouTube. After 24 hours, there were over 1.5 million uploads of the video – 80% of which were successfully moderated through image copy detection (for example such as hash matching) methods. Hence, online platforms, especially social media, have great incentives to automatically detect harmful content posted and reposted on their servers as this can greatly enhance the speed and effectiveness of content moderation, while protecting users from previously identified harmful content.  

How does image copy detection work? 

The goal of image copy detection is to identify whether two images originate from the same source or not. This is different from other detection issues which deal with original images in that these are copies of one image that are being manipulated in different ways (e.g., blurred, rotated, scaled, edited, etc.) to “fool” detection. 

Images are represented digitally as a collection of pixels. A pixel, the smallest unit of an image, is defined by a numerical value corresponding to the color or intensity of that portion of the picture. Applying transformations to an image—such as rotation, color filters—produces copies that our visual system easily perceives as similar to the original. However, digital representation changes dramatically, making it much harder for computer systems to identify similar images. 

Figure 1: source – https://cs231n.github.io

 As re-uploaded content is often modified, exact copy detection comparing original pixel values to all other uploaded images misses too many images which are similar to the human eye. This is why most image detection models use algorithms to map patterns from the original pixel values to representations (called embeddings), allowing the capture of re-uploaded content that has been modified. 

Experimental setup

Tremau researchers have run a number of experiments to test the accuracy of different copy detection methods. Using data from Facebook, the team deployed a Local Detector Based Model SIFT model and three Deep Learning Based Models (GVRL, SSCD, and DENA).

The results indicate significant gaps between the various models tested, confirming the  importance for online platforms to appropriately choose the detection tools that best suit their needs and types of data they host. Ideally, platforms should also  have the ability to work with multiple choices of copy detection methods.

As new methods are also continuously developed, it is critical that companies continuously assess – and possibly replace – the tools they use. While this may be time-consuming, improvements in the accuracy of the tools can significantly dwarf such costs over time. 

How can Tremau help?

It is crucial for all online companies to be aware of image manipulation methods and take action to prevent multiple posts of variations of known illegal content. If you would like more information about how to secure your platform, streamline your moderation processes or are unsure which detection tool is right for you contact us

We're excited that you're enjoying our content and would like to read more.

 By providing us with your email address, you’ll gain access to exclusive content, special offers, and updates about our latest articles. We’re committed to providing you with high-quality content that’s informative, engaging, and relevant to your interests.

Access the full article

Join our community

Stay ahead of the curve – sign up to receive the latest policy and tech advice impacting your business.