Theodoros Evgeniou* (Tremau), Max Spero* (Checkfor.ai)
Arguably “the person of the year for 2023” has been AI. We have all been taken by surprise by the speed of innovation and capabilities of Large Language Models (LLMs) and more generally generative AI (GenAI). At the same time, many, particularly in online platformsAn online platform refers to a digital service that enables interactions between two or more sets of users who are distinct but interdependent and use the service to communicate via the internet. The phrase "online platform" is a broad term used to refer to various internet services such as marketplaces, search engines, social media, etc. In the DSA, online platforms..., raise questions about potential risks these technologies can raise – see this Harvard Business Review article outlining some AI risks. Online platforms may soon get flooded with AI-generated content, with implications for their users’ safety and retention as well as for the platforms’ reputation. There are already startups that offer tools to generate and disseminate massive volumes of GenAI content.
But AI and GenAI can also be used for our benefit to manage these risks and help us create safer digital spaces and online platforms, as seen in some ideas from the latest Trust & Safety Hackathon. As new tools emerge, it is a good time to take stock of where we are in terms of the latest innovations and processes to manage the risks online platforms face due to GenAI.
This article can help answer questions such as:
- How can we best protect our business, online communities, and users from volumes of AI-generated content (e.g., from review spam to illegal content violating copyright and other laws)?
- Can AI-generated content be detected?
- At what point in the life cycle of AI-generated content can safety guardrails be used, and how?
- How are related regulations shaping up across different markets, and what do they mean for you?
Develop your GenAI Policy considering your business model and needs
Every business with user-generated content needs a GenAI policy. There are generally two questions to be answered. Do users want to see AI-generated content, and are users okay with AI content intermixed with human content?
If your answer to either question is no, then you need to have a policy around AI content. For example, requiring that AI content be disclosed or expressly disallowing AI content. Such a policy can be enforced by human moderators with a keen eye and effective processes with tools like Checkfor.ai.
If the answer is yes – users are okay or enthusiastic about seeing AI content – then from a policy point of view you’re good. However, before you go ahead and introduce AI tools directly like Linkedin’s AI-assisted messages, you’ll still need to make sure that the content is safe. For this, you need some guardrails and, more importantly, to always put in place processes to effectively and efficiently moderate AI-generated content, similar to moderation of user-generated content, using tools like Tremau’s moderation platform.
Of course, your GenAI policy depends on your business and context. There is no one-size-fits-all. For example, if you are a marketplace or generally a platform where users rely on other users’ reviews, you may need to ensure no AI-generated reviews find their way to your platform. More generally, you also need to ensure that no illegal content generated by AI, much like user-generated content, lives on your platform. Bots and spam have always been a challenge, but with the power of GenAI they are more powerful and harder to catch.
Understand and leverage AI Guardrails
Most commercial AI APIs provide some sort of AI guardrails. Google’s Gemini API automatically rates its outputs on each of four safety categories: Hate SpeechHate speech is any form of communication, whether written, spoken or otherwise expressed, that attacks or incites violence, discrimination or hostility against a particular individual or group on the basis of their race, ethnicity, nationality, religion, sexual orientation, gender identity, or other characteristics., Harassment, Sexually Explicit, and Dangerous Content. If you use Azure’s OpenAI API, you get similar ratings based on the content filters “Hate and Fairness”, Sexual, Violence, and Self-Harm. Both APIs will reject queries that score too highly on any of these categories, but leave intermediate levels of safety moderation up to your discretion.
If you’re using an open-source model such as Llama-2 or Mistral, you’ll need to roll your own content filter. This can be solved with a separate call to a closed-source classifier (OpenAI’s content filter API, Azure’s AI content safety API) or an open-source solution such as Meta’s newly released LlamaGuard. LlamaGuard is a 7B-parameter LLM-based model that benchmarks very well. It shows promise for prompt and response classification, as well as general content moderationReviewing user-generated content to ensure that it complies with a platform’s T&C as well as with legal guidelines. See also: Content Moderator.
Ensure that humans are still involved and your processes comply with regulations
No matter what automated tools you may use to protect your users and business, no technology can fully protect you. All AI tools you use will always make mistakes. You need to ensure such mistakes don’t expose you to operational, customer or regulatory risks.
First, you will always need to involve humans in the loop who, at the least, will be reviewing some of the content the tools may flag for them to check. Of course, your content review processes need to be effective and efficient. Ironically, the more AI tools become available in the market (e.g., for generating or moderating content) the more people you may need to involve in some cases.
Second, any content moderation processes and practices need to be designed with your users’ safety and retention – hence also business – in mind. What if errors of your moderation raise concerns? How do you ensure your users have a voice when needed to correct your – or your AI’s – decisions? How to ensure your moderators have all they need to make the best moderation decisions as efficiently and effectively as possible? Managing these, and other, complexities requires that you carefully think about and automate effectively your processes, using for example tools like Tremau’s content moderation platform.
Finally, 2024 will be the year you will really need to double down ensuring you are not among the companies fined by regulators. The EU’s Digital Services Act will be live for all online platforms operating in Europe, with requirements for you to re-design your processes and provide – or else get fined – reports, such as transparency reports. Of course compliance is necessary whether or not your platform is impacted by or uses AI.
How can we help you?
At Checkfor.ai and Tremau we work to help you best navigate the new world of powerful AI and new regulations.
To find out more, contact us at info@tremau.com and info@checkfor.ai.
*Theodoros Evgeniou is co-founder and Chief Innovation Officer of Tremau, Professor at INSEAD, member of the OECD Network of Experts on AI, advisor to the BCG Henderson Institute, and has been an academic partner on AI at the World Economic Forum. He holds four degrees from MIT, including a PhD in the field of AI.
*Max Spero is a co-founder and CEO of Checkfor.ai. Previously he was a software engineer at Google and Nuro, building data pipelines and training machine learning models. He holds a BS and MS in Computer Science from Stanford University.