Simple Swear/Badword Filtering tutorial

In this tutorial you’ll learn how to censor the swear words and profanity on your website.

You could even use this tutorial to prevent people making posts about your competitors.

* Contains strong language *


First things first, the functions we will be using in this tutorial.

The nice things about this tutorial are that it will teach you how to load a file as an array using the file() function, it will also cache that file using global to see if the censored words have already been loaded and it will replace them with their counterpart (word for example as either w***d or *****, both 5 letters).

To start things out, were going to create a file with the swearwords.
Alternitavely, you could create a string, but we’ll talk about that later.
If you would like a list of many common swearwords, check out my Swear Word Text File.

swearwords.txt

shit
bitch
arkinex
google

then, the php part of the censoring.

/* arkinex-censor.php v1.0 - www.arkinex.com */
/* don't forget to create swearwords.txt */
function censor($string,$all=false) {
// Global checks the cache to see if $censor_words already exists.
	global $censor_words;
	// If there is no cache aka censor_words is empty or is not an array
	// Then load the swearwords.txt
	if (empty($censor_words)||!is_array($censor_words)) {
		$censor_words = Array();
		$censor_words = @file('swearwords.txt');
	}
	$censors = Array();
	// Check if the censor words are empty.
	// Returns an invisible error and the string if they are.
	if (empty($censor_words)) return '<!--error-->'.$string;
	// This foreach loops through each censor word
	foreach ($censor_words as $word) {
	// This checks if you want all the word censored or partial
	// Str_repeat repeats * for the number of letters (minus 2)
	// Substr grabs the first and last letter (the 2)
		if (!$all) $censors[] = substr($word,0,1).str_repeat('*', strlen($word)-2).substr($word,-1);
	// If you want to censor the full word..
	// It Str_repeats for the length of the entire word
		else $censors[] = str_repeat('*', strlen($word));
	}
	// Str_replace replaces all the words with their censors
	$string = str_replace($censor_words, $censors, $string);
	// This returns the string for display
	return $string;
}
 
echo censor("The cat sat on the arkinex",true);
// Outputs: The cat sat on the a*****x

Now to the example:

echo censor("The cat sat on the arkinex");
// Outputs: The cat sat on the a*****x
echo censor("You should all go to arkinex!", true);
// Outputs: You should all go to *******!

You can download the full snippet, swearword-censor.phps.
Reply to this article if you need additional help or have any idea’s.

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

There seems a problem in that code… you try to replace undefined variable $words with $censor… you should write $censor_words instead $words…

@1: You are correct, I rewrote this tutorial to include file caching incase our users were using the censor function multiple times within one pageload. In doing so i renamed some of the variables to make more sense and forgot to update all of them.

Thank you PDesign. This has been fixed.

Wow! Good resources here, Enjoyed the visit!

Great tut, I intend to expand on this a bit, but a great start. I am working on some forum software, and this is a great resource.

Very nice tutorial, thanks

is it possible to convert this to read from a database instead txt file?

@6: I emailed you this reponse also:

It is possible to convert this to run from a mysql database/table instead of a text file by simply loading the mysql table into the swear words array instead of the text file.

To make it run from a mysql DB, change the code in the 1st if statement to something like…

$sql = “SELECT words FROM censor_table”; //Or whatever your table/field is called
$result = mysql_query($sql);
$censor_words = mysql_fetch_array($result);

Not tested.

I have been working on a comprehensive swear filter for quite some time, this is a great start to eliminating some words based off a list but there are so many more ways to cheat the filter using special characters.

I have made a filter that scrubs the string several times and compares based on filters and punctuation, it is working quite well now, almost impossible to get anything through and the word list is only about 20 words long.

I tried the filter. Its not that great because you can cheat it by simply playing around with the capitalization.

Leave a comment

(required)

(required)