Google Sitemap Generator
February 26th, 2008 by pyrat
Google sitemaps are nice for telling google what is where. Often clients want it for SEO or you have a site which has new content all the time and you want to keep google up to date.
Whatever the reason is thats you are interested in these little xml files, the following code allows you to generate a sitemap for a dynamic site in ruby.
Firstly the class:
require 'net/http' require 'uri' # A class specific to the application which generates a google sitemap from # the contents of the database. # Author: Alastair Brunton class GoogleSitemapGenerator def initialize(base_url, sources) @base_url = base_url @sources = sources end # The main generator method which in turn adds to the path_array from the different # sources. # Sources are: pages, events, properties def generate path_ar = Array.new @sources.each do |source| # initialize the class and call the get_paths method on it. path_ar = path_ar + eval("#{source}.get_paths") end xml = generate_xml(path_ar) save_file(xml) update_google end # This creates the xml document. def generate_xml(path_ar) xml_str = "" xml = Builder::XmlMarkup.new(:target => xml_str) xml.instruct! xml.urlset(:xmlns=>'http://www.google.com/schemas/sitemap/0.84') { path_ar.each do |path| xml.url { xml.loc(@base_url + path[:url]) xml.lastmod(path[:last_mod]) xml.changefreq('weekly') } end } xml_str end # Saves the xml file to disc. This could also be used to ping the webmaster tools def save_file(xml) File.open(RAILS_ROOT + '/public/sitemap.xml', "w+") do |f| f.write(xml) end end # Notify google of the new sitemap def update_google sitemap_uri = @base_url + '/sitemap.xml' escaped_sitemap_uri = URI.escape(sitemap_uri) Net::HTTP.get('www.google.com', '/webmasters/sitemaps/ping?sitemap=' + escaped_sitemap_uri) end end
You will notice that an array of strings are passed when calling the generator. These are names of object which implement the get_paths method. An example get_paths class method is as follows:
# for the google sitemap def self.get_paths path_ar = Array.new Property.live_properties.each do |property| path_ar << {:url => "/property/#{property.to_param}", :last_mod => property.updated_at.strftime('%Y-%m-%d')} end path_ar end
Basically, you need an array of hashes which each contain the url and the last_mod.
To call this little beastie it is best done from a cron on the production server. An example rake task to do this is as follows:
namespace :google_sitemap do desc "Generate a google sitemap from the site." task(:generate => :environment) do sources = ['Page', 'Event', 'Property'] sitemap = GoogleSitemapGenerator.new('http://www.your_url.com', sources) sitemap.generate end end
Remember when you are calling it from a cron to pass the RAILS_ENV. This generator does rely on rails but you could convert it to only rely on ruby by modifying the rake task and changing the RAILS_ROOT reference in the save_file method. Probably can be made to work with Merb but I am unsure of how merb and rake work together. Will hopefully get my hands dirty with Merb sometime soon.
cd /var/www/apps/site/current /usr/bin/rake RAILS_ENV=production google_sitemap:generate








