Why Not All Filtering Solutions Are Created Equal

published March 14, 2018

10 min read

The mission to bring digital Learning into our classrooms has been decades in the making. Back in 1996, President Clinton led 20,000 volunteers in an effort to connect California public schools to the “brave new world of mouse clicking and web surfing.” Oh, how far we’ve come. According to Futuresource Consulting, nearly half of K-12 students now have access to 1:1 computing.

Today, more and more school districts are closing the “connectivity gap,” providing students with unprecedented access to technology and new ways of Learning. With more freedom comes more responsibility. In an era where educators and administrators are tasked with building safe and effective digital Learning environments, online filtering tools are a must.

However, not all filtering is created equal. While URL-based filtering is limited in its ability to identify inappropriate content, AI has made filtering more nuanced and effective than ever before. A new method has emerged that uses artificial intelligence to analyze the actual content on web pages in real time, allowing students to see only material that educators deem safe and productive. This innovative method is called content-based filtering.

Where Traditional Web Filtering Falls Short

Conventionally, filtering has involved monitoring and filtering web access by keeping lists of appropriate and inappropriate websites. However, the dynamic nature of the internet makes traditional URL-based filtering—the process of categorizing entire web pages by their domains—a largely ineffective method of controlling the content that students access.

When the internet was first created, web pages were static HTML documents, comprised of content that changed infrequently. This allowed internet filtering software to shape students’ online activity by blocking certain websites based on their URLs.

The sheer pace at which new web pages are created makes it impossible to add every inappropriate web page to a backlist. In March 2017, there were more than 330 million registered domain names on the Internet; as of August 2017, there were an estimated 1.2 billion websites worldwide.That’s an average of 236,928 new websites created each hour.

What complicates things further, is that the actual content on these websites is constantly changing. Even if it were possible to keep up with the number of unique URLs created, the content of a web page might be appropriate one day, and inappropriate the next. What was yesterday a thought-piece on Alexander Hamilton might today be a spammy web page filled with diet pill ads and hate speech. This calls for a filtering solution that’s as responsive as the terrain it’s trying to manage. This calls for AI.

Why “Keyword Flagging” Doesn’t Cut It

Many products aim to solve the shortcomings of URL-based filtering by using a technique known as keyword flagging. The software looks for pre-selected words or phrases that are likely to indicate that the content is inappropriate. The presence of these keywords on a web page or in a search query triggers a response, such as blocking the page and/or notifying an administrator.

But keyword flagging is troublesome because it can’t account for the context in which a word or phrase appears. This results in a large number of false positives: web pages that are flagged as inappropriate when they are not. And that creates an additional hassle for K–12 leaders and technology administrators.

For example, a program might be set up to flag a student’s use of the word “bomb” online. But depending on the context, the use of this word can mean very different things. A search query such as “what kind of bombs were used in World War I?” might indicate that the student is researching a history project, while a search for “how to make a bomb” might suggest a potential threat.

Enter AI-Powered Content Filtering

A content-based filter powered by artificial intelligence (AI) technology can solve this problem. Instead of looking for certain keywords, or blocking a website based on its URL, the solution understands the context of the material on the page to determine what it’s about, and whether it’s appropriate for the student.

Here’s how it works: Developers show the software hundreds of thousands of examples of web page content that is appropriate or is not appropriate for students at different age levels, and the software “learns” how to distinguish between these.

Because AI understands the context in which certain words or phrases are used, web filtering becomes more accurate. And because it’s doing this in real time, the dynamic nature of the internet is no longer an issue.

What’s more, with content-based filtering, the computer software is self-correcting. It teaches itself, based off of feedback. When K–12 administrators and educators indicate that a web page is flagged or blocked, either correctly or incorrectly, the software learns from this incident and becomes even better able to categorize web content over time. The software is continually improving as it receives feedback from educators.

Because of the dynamic nature of the internet, keyword and URL-based filtering solutions are outdated methods of safeguarding students’ digital Learning.

Alternatively, AI-based filtering systems are more accurate than traditional approaches, resulting in fewer false positives and more actionable insights. This not only reduces the likelihood of over-blocking or under-blocking; it also reduces the burden on administrators by streamlining their monitoring of internet activity on school-issued devices. Solutions like GoGuardian’s Admin product also incorporate feedback from administrators, so the systems are continually Learning and improving.