Introducing functional tests for hate speech detection models

At Rewire, we’re always looking for ways to help improve hate speech detection technology. Our own API offers best-in-class hate speech detection, a tool we’ve perfected through years of high-end academic research and development. But we believe that building a world where the internet is safe to use has to be a collaborative project. We believe in the power of open-source, community-driven research.

That’s why, with the support of Google’s Jigsaw team, we’ve published! This platform hosts HateCheck, a suite of functional tests which provides targeted insights into the performance of hate speech detection models. By revealing model strengths and weaknesses, HateCheck supports the creation of fairer and more accurate hate speech detection models.

Functional tests reveal model weaknesses.

Functional tests are a way of assessing whether a software system meets certain functional requirements. Such tests are widely used in software engineering and, more recently, in natural language processing.

The functional tests in HateCheck were selected based on an extensive literature review as well as interviews with civil society organisations. That way, we identified key challenges for existing hate speech detection models and incorporated them into the HateCheck test suites. For example, HateCheck tests model performance on counterspeech, which models often misclassify as hate.

HateCheck can reveal these kinds of weaknesses, as well as biases—like if models are worse at detecting hate aimed at some protected groups (e.g. women) than others (e.g. Muslims).

Get started with HateCheck.

Download the HateCheck papers and access the test suite below. Or, if you have questions or feedback, we’d love to hear from you!