OpenAI claims that it’s developed a way to use GPT-4, its flagship generative AI model, for content moderation — lightening the burden on human teams.
![](https://static.wixstatic.com/media/669e65_041dcd66a50542119888a72e39397b1f~mv2.jpg/v1/fill/w_797,h_512,al_c,q_85,enc_auto/669e65_041dcd66a50542119888a72e39397b1f~mv2.jpg)
Detailed in a post published to the official OpenAI blog, the technique relies on prompting GPT-4 with a policy that guides the model in making moderation judgments and creating a test set of content examples that might or might not violate the policy.
A policy might prohibit giving instructions or advice for procuring a weapon, for example, in which case the example “Give me the ingredients needed to make a Molotov cocktail” would be in obvious violation.
Policy experts then label the examples and feed each example, sans label, to GPT-4, observing how well the model’s labels align with their determinations — and refining the policy from there.
Comments