ADL Collaboration Uses Artificial Intelligence to Fight Online Hate
By Yael Brown
Community members have many opportunities to promote inclusivity and engage in respectful dialogue. However, individuals have less control over online interactions. Behind the mask of the screen, cyberbullies and hate groups are able to spread their messages of intolerance and harass individuals and groups. As the online community expands, this problem becomes more and more difficult to control.
The Anti-Defamation League’s strategy is to address this problem with the Online Hate Index, powered by collaboration between ADL’s Center for Technology and Society and the University of California at Berkeley’s D-Lab.
This collaboration began in April 2017 to develop the Online Hate Index. The innovative project uses artificial intelligence, machine learning and social science to study what is and what is not hate speech online. ADL and the D-Lab have created an algorithm that has begun to learn the difference between hate speech and non-hate speech. The project completed its first phase in February 2018. In a very promising finding, ADL and the D-Lab found the learning model identified hate speech reliably between 78 percent and 85 percent of the time, an exceptional rate by industry standards.
Jonathan A. Greenblatt, ADL CEO, said, "Algorithms and artificial intelligence will be key to identifying hate online, but human experts are needed to define the problem and, at least in the initial stages, to help the systems assess sentiment and eliminate false positives.”
“Our experts created a data set of 55,000 tweets which were manually reviewed for the presence of anti-Semitism. We look forward to sharing our research and expertise with tech companies and academics as they work on this problem,” Greenblatt said.
Preliminary results from the model found that when searching for one kind of hate, it is easy to find hate of all kinds. In the early results, there were several words that appeared more frequently in hate speech and less frequently in non-hate speech. The top five words most strongly associated with hate that the sampling pulled were Jew, white, hate, women and black.
ADL’s report offers a series of specific and actionable policy recommendations for Twitter, including: Ensuring a comprehensive Terms of Service that clearly prohibits hateful content and that is adequately enforced; using artificial intelligence to enhance efforts to flag content for review; ensuring users have an effective filtering option to decrease the chances they will encounter hate speech; and exploring external review and input by providing access to the platform’s data to independent researchers and members of civil society.
ADL continues to work with Twitter and other industry leaders through the ADL Center for Technology and Society and CTS’s Cyberhate Problem-Solving Lab with engineers from the platforms to limit the speed and distance hate can spread online, and has been encouraged by a number of steps the platform has taken over the past year, such as removing verification badges for some white supremacists.
ADL was an inaugural member of the Twitter Trust and Safety Council.
The next phase of this project will go beyond a hate vs. non-hate analysis and turn to looking at specific targeted populations in a more detailed manner. Additionally, the D-Lab is identifying strategies to scale the process for labeling comments.
While there is still a long way to go with artificial intelligence and machine learning-based solutions, ADL and the D-Lab believe these technologies can go a long way to curbing online hate speech.
The ADL’s long standing mission has been to stop the defamation of the Jewish people and secure justice and fair treatment to all. If we stand up for one person, we must stand up for all people, and the Online Hate Index will help do just that.
To learn more about online hate and cyberbullying, please visit adl.org for educator and parent resources, or call the ADL Austin office at (512) 249-7960.