Tuesday, December 9, 2008

Part 1 of the SEO for PHP Series

Finally, an article that will explain the problem and answer the question that has perplexed and eluded web developers and SEO experts for years. The question: How do I get my PHP pages indexed on Google and other search engines?

The Myth Exposed
First, I need to point out that there has always been a myth surrounding Google’s indexing (or non-indexing) of PHP pages, specifically dynamically generated pages. I suppose my mysterious title to this entry doesn’t help the matter, because Google is easily able to index both dynamically generated PHP pages and PHP pages with dynamically generated content (read further down for an explanation). There are some good practices that webmasters should follow to make sure their site’s pages are clearly readable, but there’s not really a big mystery. I’ll explain in later posts in this series how you can make your PHP site Google ready. In the mean time, Google (as always) is smart enough to sort through most of the garbage that people sometimes post to the web. So why write an article you ask? Because there are so many people who still believe this myth, and it’s about time someone pointed out that Google has already set the record straight.

It’s easy to see how such a myth could be perpetuated over time. Google did have an excerpt from their Webmaster Guide back in 2006 that said Google had a more difficult time indexing pages with a query string in the URL (for those of you not familiar with URL parts, read this entry by Google’s Matt Cutts). Google once warned:

“Don’t use “&id=” as a parameter in your URLs, as we don’t include these pages in our index.”

Ah ha! The spark that started the fire. So, Google really did have a hard time at some point indexing PHP pages? Well, not so fast Mr. or Ms. Webmaster pants. They didn’t say anything about dynamically generated pages, or even PHP for that matter. They simply said it’s best not to use certain query strings in your URLs. It has been exaggerated and expanded over the years to somehow include the idea that Google is wholly incapable of indexing pages processed by PHP. This notion was false then, and is false now. In fact, Google has now even removed that line from the Webmaster Guide and has stated clearly that it’s OK to have dynamically generated pages.

People who actually still believe that PHP cannot be indexed by Google have obviously never performed this query:

inurl: .php

What? That’s an outrage! Google doesn’t index PHP pages! Especially not that first one that is dynamically generated and contains “?id=” in the query string! OK, now I’ll remove my tongue from my cheek so I can finish up here.


A Technical Explanation
Alright, here’s where we’ve got to get a little technical. Don’t be afraid, it’s not too bad. After all, it was the people who weren’t willing to explore a little bit of the technical stuff that perpetuated this myth in the first place. Be a part of the solution and keep reading!

First, there is a real difference between dynamically generated pages and pages with dynamically generated content. It is often the case that dynamically generated pages will also have dynamically generated content, but not all pages with dynamically generated content have to be themselves dynamically generated. Confused? Let me explain.

Pages With Dynamic Content

Let’s say I have a file called faqs.php that actually takes up memory on my server, and contains PHP code that performs a certain function. Maybe the code accesses a database, obtains a list of FAQs, and prints them out to the screen. This is an example of a non-dynamically generated page that dynamically generates content. In this case, the faqs.php script is only going to consistently deliver FAQs to the user. It will probably never show a terms of use, contact, or about page. It’s for FAQs.

Google has never had a problem indexing these types of pages, even though they are created using PHP. The fact that PHP is involved is a moot point because all the content has been delivered to the browser or search engine spider after the content was compiled together on the server. The spiders don’t care what happens behind the scenes to deliver my page, especially when I have this simple URL: http://www.mysite.com/faqs.php. For all intents and purposes, this page is the exact same thing as a static HTML page delivered from the server to the spider or browser (the only difference is the .php extension in the filename). Even the file extension is a non-issue considering the fact that a webmaster can setup Apache to execute and deliver PHP pages with the .html extension, which would totally mask the fact that the page content has been delivered dynamically.

Dynamically Generated Pages

Now, let’s imagine I have a file called index.php, and it has a query string appended to it so it looks like this: index.php?page=faqs. This script might be producing what people would call a dynamically generated page. It too, in this case, delivers content to the screen, also showing information about FAQs, just like our previous example. The difference with this index.php page is that it’s probably not specific to just FAQs. It’s probably also used to deliver the terms of use, contact us, and about pages respectively. In theory (but not in good practice), you could have an entire website of thousands of pages that come from one single PHP script.

These are the types of pages that Google has warned about in the past. While they’ve never had problems with the types of pages described previously, you can see how Google may have had a hard time in the past, or just simply refused to keep track of and index these types of pages, especially if multiple name value pairs exist in the query string.

Short and Simple Answers
OK, your head may be spinning after all that rigmarole. Let’s look at how Google themselves answers the question: “Does Google index dynamic pages?” The short and simple answer is yes.

Well, would you look at that! Google themselves set the record straight once and for all. I don’t necessarily like their suggestion of making static pages with the same content as your dynamic pages (if that was an option there’d be little reason to use dynamic pages in the first place). Regardless, Google essentially says to all you wild and wacky webmasters with equally wacky URLs, “Go ahead and make them dynamic pages! We’ll round-up and corral your content free of charge.”

So Google has caught up to the Wild Wild Web in this regard—and they’ve done so just as a new frontier is emerging. One of pretty URLs and simple-to-index PHP sites and pages. One where dynamic pages and dynamic content are still the staple, but with a new 2.0 friendly flare. Check back soon for the next article where I’ll explain how to create a simple PHP site that implements some of these newfangled things, and employs some suggestions from Google Webmaster Central.

Posted by: Peter Ehat

No comments: