Google crawl errors for posts with special characters in title
-
I recently discovered that I was getting quite a few crawl errors for a bbPress installation (1.0.2) and the common denominator was special characters in the title. The Google crawler was changing the hex characters in the encoded URL to uppercase, and this was causing a 302 redirect.
I tracked the 302 redirect to bb_repermalink(), which detects the uppercase hex as a discrepancy with the “correct” permalink. I made a simple plugin that works around the issue (see below).
Has anyone else seen this issue? How did you deal with it?
I’ve described this in a little more detail at http://theblogeasy.com/2009/12/26/bbpress-and-encoded-urls-with-uppercase-hex/.
function _permalink_fix( $permalink, $location )
{
$matches = array();
/* are there any URL encoded hex characters with uppercase in the request URI? */
if (preg_match( '#%([0-9][A-F]|[A-F][0-9]|[A-F][A-F])#', $_SERVER['REQUEST_URI'], $matches ))
{
/* replace ALL URL encoded HEX parameters with uppercase versions */
$patterns = array(
'#%([0-9])([a-f])#e',
'#%([a-f])([0-9])#e',
'#%([a-f][a-f])#e' );
$replacements = array(
'"%" . $1 . strtoupper("$2")',
'"%" . strtoupper("$1") . $2',
'"%" . strtoupper("$1")' );
// print_r( $patterns ); print_r( $replacements );
$permalink = preg_replace( $patterns, $replacements, $permalink );
}
return $permalink;
}
add_filter('bb_repermalink_result', '_permalink_fix', 10, 2);
- You must be logged in to reply to this topic.