Function to Generate a Slug (URL-friendly String) [Helpers] [Updated]
Update #1: With the help of my esteemed colleague Zach, I've provided a PHP function as well and fixed some bugs with the .NET version.
Update #2: Added the "-" character to the non-invalid chars list so passing in an already-generated slug didn't break it.
Update #3 (3/17/2010): After some unit testing, fixed bug where it would keep multiple hyphens instead of condensing them into one hyphen. Added parameter to adjust max length.
The term "slug" was first coined, I believe, by Wordpress to mean a URL-friendly version of a post title. From their codex:
A slug is a few words that describe a post or a page. Slugs are usually a URL friendly version of the post title (which has been automatically generated by WordPress), but a slug can be anything you like. Slugs are meant to be used with permalinks as they help describe what the content at the URL is.
How do you go about converting an unfriendly title into a friendly one?
Here's a helper function that does just that:
In C#:
public string GenerateSlug(string phrase, int maxLength)
{
string str = phrase.ToLower();
// invalid chars, make into spaces
str = Regex.Replace(str, @"[^a-z0-9s-]", "");
// convert multiple spaces/hyphens into one space
str = Regex.Replace(str, @"[s-]+", " ").Trim();
// cut and trim it
str = str.Substring(0, str.Length <= maxLength ? str.Length : maxLength).Trim();
// hyphens
str = Regex.Replace(str, @"s", "-");
return str;
}
In PHP:
function generateSlug($phrase, $maxLength)
{
$result = strtolower($phrase);
$result = preg_replace("/[^a-z0-9s-]/", "", $result);
$result = trim(preg_replace("/[s-]+/", " ", $result));
$result = trim(substr($result, 0, $maxLength));
$result = preg_replace("/s/", "-", $result);
return $result;
}
What is it doing?
- Transform string into lowercase
- Remove invalid characters (not alphanumeric or spaces or hyphens)
- Trim and convert multiple spaces/hyphens into only one space, to get the best string.
- Extract first 45 characters then trim spaces (in case the 45th char is a space)
- Transform spaces into a hyphens.
You can add on features, like changing common symbols into words, like "." into "dot" and so on.
Why a max of 45? In my DB I have the column set to a max length of 50 and in the future I may implement some sort of duplicate checking and append numbers at the end of the title. Also, I think 50 characters is a good enough length for someone to type in.
Here's how to use it:
// slug string title = @"A bunch of ()/*++\'#@$&*^!% invalid URL characters "; lblSlug.Text = GenerateSlug(title); // outputs a-bunch-of-invalid-url-characters
$title = "A bunch of ()/*++\'#@$&*^!% invalid URL characters "; echo(generateSlug($title)); // outputs a-bunch-of-invalid-url-characters
Enjoy!