Text to SEO
In our last release, our development team implemented SEO friendly URLs. This allowed us to change our URL’s from something like http://fyindout.com/research/Research.php?Id=361 to http://www.fyindout.com/research/title/Study_Cash_Flow_Margins_Declining . This is huge for search engine optimization and having FYIndOut’s vendors list higher on search engines like Google and Yahoo. It was no easy task. We ran into a few roadblocks until we finally (hopefully) got it right. This post outlines how we convert any text string to an SEO-friendly representation.
Replace spaces with underscores
Initially we thought “Okay, we’ll just replace all spaces with underscores”:
<?php $research_title = “Awesome piece of research!”; $seo_research_title = str_replace(“ “, “_”, $research_title); echo $seo_research_title; // ‘Awesome_piece_of_research!’ ?>
This worked for about 50% of our text research titles. The first error we ran into was a title that had an ampersand (&) in it:
<?php $research_title = “Erberts & Gerberts have great sandwiches.”; $seo_research_title = str_replace(“ “, “_”, $research_title); echo $seo_research_title; // ‘Erberts_&_Gerberts_have_great_sandwiches.’ ?>
The problem here is that ampersands are used as request variable delimiters. So in our code, that research title would show up as “Erberts_”. Again, this not what we’re looking for. So we started replacing all ampersands with underscores:
<?php $research_title = “Erberts & Gerberts have great sandwiches.”; $seo_research_title = str_replace(“&“, “_”, str_replace(“ “, “_”, $research_title)); echo $seo_research_title; // ‘Erberts___Gerberts_have_great_sandwiches.’ ?>
This fixed all occurrences of ampersands in research titles. This still didn’t solve all of our problems though. We’d still get database matching errors for colons, semi-colons, and a slew of other symbols. We considered using browser escape codes, but that would defeat the purpose of SEO in the first place. So the final fix was to throw out anything that wasn’t a letter, number or space. Regular expressions to the rescue:
Get rid of all the symbols
<?php $research_title = “Microsoft launches new search engine ‘Bing’!!!”; // Matches anything that’s not a digit, space, or letter $symbols = "/[^\d\s\w]/"; // Matches 1 or more spaces $spaces = "/[\s]+/"; // Replace all symbols with spaces $research_title = preg_replace($symbols, " ", $research_title); echo $research_title // ‘Microsoft launches new search engine Bing ’ // Trim whitespace from the beginning and end $research_title = trim($research_title); echo $research_title // ‘Microsoft launches new search engine Bing’ // Replace all occurrences of one or more spaces with a single underscore $research_title = preg_replace($spaces, "_", $research_title) echo $research_title // ‘Microsoft_launches_new_search_engine_Bing’ ?>
This is our final product. Let me walk you through it.
The first regex pattern, $symbols, uses negated character classes. ‘\d’ matches any digit, ‘\s’ matches any white space, and ‘\w’ matches any word character. The caret (^) preceding these character classes negates them all. Wrap that all in square brackets and then again in forward slashes and you there have a regex that will match anything that’s not a letter, number, or space.
The second regex pattern, $spaces, again uses character classes to match one or more spaces. ‘\s’ matches a space, so you wrap that in square brackets and add a plus (+) sign at the end so it matches one or more spaces. Wrap that in forward slashes and you’re done.
Finally, we return the string after replacing the symbols with spaces, trimming whitespace from the front and back, and then replacing one or more spaces with a single underscore.
Why trim whitespace?
The reason we have to trim whitespace is to eliminate underscores from appearing at the beginning and end of the text. If we used the title ‘Awesome piece of research!!!’ without trimming whitespace, it would result in ‘Awesome_piece_of_research_’. Trimming in between the regex’s eliminates those underscores from appearing.
Make it a function call
<?php
function stringToSEO($string) {
// Matches anything that’s not a digit, space, or letter
$symbols = "/[^\d\s\w]/";
// Matches 1 or more spaces
$spaces = "/[\s]+/";
// Replace, trim, replace, return
return preg_replace($spaces, "_", trim(preg_replace($symbols, " ", $string)));
}
?>
This is in fact the same function we use to generate SEO friendly titles. Feel free to use it in your web development, and if you have any suggestions or optimizations, post a comment or shoot me an email at ralph.holzmann@fyindout.com or a tweet at twitter.com/ralphholzmann.
Edit Bonus: JavaScript Version
function stringToSEO (text) {
var symbols = /[^\d\s\w]/gi;
var trim = /^\s+|\s+$/g;
var spaces = /[\s]+/gi;
return text.replace(symbols, " ").replace(trim, "").replace(spaces, "_");
}
Tags: Google, PHP, Regex, Regular Expressions, Search Engine Optimization, SEO, URLs, Yahoo
Posted by Ralphon Jun.11, 2009@ 8:34 am4 Responses to “Text to SEO”
Leave a Reply
Additional comments powered by BackType




June 18th, 2009 at 3:05 pm
Good info, I follow what you are doing, but I have one question. How do you translate that url back into a database call to get the right page?
June 20th, 2009 at 10:22 am
Hey man, great question. Stay tuned as that will be answered in my next blog post! Thanks for reading. – Ralph
June 11th, 2009 at 10:14 am
Good post by @ralphholzmann on text to seo @ http://bit.ly/GrC6E
This comment was originally posted on Twitter
June 11th, 2009 at 2:37 pm
How to make any text string SEO friendly using PHP: http://tinyurl.com/m4vn7b
This comment was originally posted on Twitter