Quantcast

FYIndOut

Text to SEO

SEO

In our last release, our development team implemented SEO friendly URLs. This allowed us to change our URL’s from something like http://fyindout.com/research/Research.php?Id=361 to http://www.fyindout.com/research/title/Study_Cash_Flow_Margins_Declining . This is huge for search engine optimization and having FYIndOut’s vendors list higher on search engines like Google and Yahoo. It was no easy task. We ran into a few roadblocks until we finally (hopefully) got it right. This post outlines how we convert any text string to an SEO-friendly representation.

Replace spaces with underscores

Initially we thought “Okay, we’ll just replace all spaces with underscores”:

<?php
   $research_title = “Awesome piece of research!”;
   $seo_research_title = str_replace(“ “, “_”, $research_title);
   echo $seo_research_title;
   // ‘Awesome_piece_of_research!’
?>

This worked for about 50% of our text research titles. The first error we ran into was a title that had an ampersand (&) in it:

<?php
   $research_title = “Erberts & Gerberts have great sandwiches.”;
   $seo_research_title = str_replace(“ “, “_”, $research_title);
   echo $seo_research_title;
   // ‘Erberts_&_Gerberts_have_great_sandwiches.’
?>

The problem here is that ampersands are used as request variable delimiters. So in our code, that research title would show up as “Erberts_”. Again, this not what we’re looking for. So we started replacing all ampersands with underscores:

<?php
   $research_title = “Erberts & Gerberts have great sandwiches.”;
   $seo_research_title = str_replace(“&“, “_”, str_replace(“ “, “_”, $research_title));
   echo $seo_research_title;
   // ‘Erberts___Gerberts_have_great_sandwiches.’
?>

This fixed all occurrences of ampersands in research titles. This still didn’t solve all of our problems though. We’d still get database matching errors for colons, semi-colons, and a slew of other symbols. We considered using browser escape codes, but that would defeat the purpose of SEO in the first place. So the final fix was to throw out anything that wasn’t a letter, number or space. Regular expressions to the rescue:

Get rid of all the symbols

<?php
   $research_title = “Microsoft launches new search engine ‘Bing’!!!”;

   // Matches anything that’s not a digit, space, or letter
   $symbols = "/[^\d\s\w]/";

   // Matches 1 or more spaces
   $spaces = "/[\s]+/";

   // Replace all symbols with spaces
   $research_title = preg_replace($symbols, " ", $research_title);

   echo $research_title
   // ‘Microsoft launches new search engine  Bing    ’

   // Trim whitespace from the beginning and end
   $research_title = trim($research_title);

   echo $research_title
   // ‘Microsoft launches new search engine  Bing’

   // Replace all occurrences of one or more spaces with a single underscore
   $research_title = preg_replace($spaces, "_", $research_title)

   echo $research_title
   // ‘Microsoft_launches_new_search_engine_Bing’
?>

This is our final product. Let me walk you through it.

The first regex pattern, $symbols, uses negated character classes. ‘\d’ matches any digit, ‘\s’ matches any white space, and ‘\w’ matches any word character. The caret (^) preceding these character classes negates them all. Wrap that all in square brackets and then again in forward slashes and you there have a regex that will match anything that’s not a letter, number, or space.

The second regex pattern, $spaces, again uses character classes to match one or more spaces. ‘\s’ matches a space, so you wrap that in square brackets and add a plus (+) sign at the end so it matches one or more spaces. Wrap that in forward slashes and you’re done.

Finally, we return the string after replacing the symbols with spaces, trimming whitespace from the front and back, and then replacing one or more spaces with a single underscore.

Why trim whitespace?

The reason we have to trim whitespace is to eliminate underscores from appearing at the beginning and end of the text. If we used the title ‘Awesome piece of research!!!’ without trimming whitespace, it would result in ‘Awesome_piece_of_research_’. Trimming in between the regex’s eliminates those underscores from appearing.

Make it a function call

<?php
   function stringToSEO($string) {

      // Matches anything that’s not a digit, space, or letter
      $symbols = "/[^\d\s\w]/";

      // Matches 1 or more spaces
      $spaces = "/[\s]+/";

      // Replace, trim, replace, return
      return preg_replace($spaces, "_", trim(preg_replace($symbols, " ", $string)));
   }
?>

This is in fact the same function we use to generate SEO friendly titles. Feel free to use it in your web development, and if you have any suggestions or optimizations, post a comment or shoot me an email at ralph.holzmann@fyindout.com or a tweet at twitter.com/ralphholzmann.

Edit Bonus: JavaScript Version

function stringToSEO (text) {

   var symbols = /[^\d\s\w]/gi;
   var trim = /^\s+|\s+$/g;
   var spaces = /[\s]+/gi;

   return text.replace(symbols, " ").replace(trim, "").replace(spaces, "_");
}
Spread the word:
  • Facebook
  • Twitter
  • LinkedIn
  • Digg
  • del.icio.us
  • Mixx
  • Google Bookmarks
  • email
  • Print
  • Reddit
  • Sphinn
  • Technorati

Tags: , , , , , , ,

Posted by Ralphon Jun.11, 2009@ 8:34 am

3 Tweets

4 Responses to “Text to SEO”

  1. Mike Says:

    Good info, I follow what you are doing, but I have one question. How do you translate that url back into a database call to get the right page?

  2. Ralph Says:

    Hey man, great question. Stay tuned as that will be answered in my next blog post! Thanks for reading. – Ralph

  3. tcolon Says:

    Good post by @ralphholzmann on text to seo @ http://bit.ly/GrC6E

    This comment was originally posted on Twitter

  4. ralphholzmann Says:

    How to make any text string SEO friendly using PHP: http://tinyurl.com/m4vn7b

    This comment was originally posted on Twitter

Leave a Reply

Additional comments powered by BackType