This post is a guide on how to extract your blog posts information from Drupal 7 to other systems. And also automatically create a redirect files from the old blog to the new one. In this case, I migrated to Jerkyll/Octopress blog but from the data extracted in with my script you can migrate any other blog system. Hopefully, this will save you a lot of time if you need to do the same task. If you run into troubles go to last section of the post it has some suggestions.

Extract data from Drupal 7 site

SQL extraction

You need to extract the data from your Drupal 7, there are several ways. You can connect to your web host via ssh and generate SQL dump.

1
mysqldump –uUSERNAME –pPASSWORD DATABASE > FILENAME.sql

(replace the UPPERCASE letters with your settings)

You can download the file *.sql to your computer and run the following command to install to upload the data in your local database.

1
mysql –uUSERNAME –pPASSWORD DATABASE < FILENAME.sql

If you have a access to you phpmyadmin in your host server you can download your sql dump file through that also. Other method is to use a local port fordwarding using SSH… anyways, get access to your database.

Run the script

The 2nd and final step is to run the script that does all the magic. Below I will explain how it works in case that you want to customize.

#
# Jekyll migrator for Drupal 7+
# adrianmejia.com
#
require 'rubygems'
require 'sequel'
require 'fileutils'
require 'yaml'
# NOTE: This converter requires Sequel and the MySQL gems.
# The MySQL gem can be difficult to install on OS X. Once you have MySQL
# installed, running the following commands should work:
# $ sudo gem install sequel
# $ sudo gem install mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
#
# UPDATE:
# Install mysql2 as well you might need it as well.
# $ gem install mysql2 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
#
OLD_DOMAIN = "http://adrianmejiarosario.com" ## TODO replace this out
NEW_DOMAIN = "http://adrianmejia.com" ## TODO replace this out
module Jekyll
module Drupal
# Reads a MySQL database via Sequel and creates a post file for each post
# in wp_posts that has post_status = 'publish'. This restriction is made
# because 'draft' posts are not guaranteed to have valid dates.
QUERY =<<SQL
SELECT
n.nid,
n.title,
n.created,
n.changed,
b.body_value AS 'body',
b.body_summary,
b.body_format,
n.status,
l.alias AS 'slug',
GROUP_CONCAT( d.name SEPARATOR ', ' ) AS 'tags'
FROM dr_url_alias l, dr_node n
JOIN dr_field_data_body b ON b.entity_id = n.nid
JOIN dr_taxonomy_index t ON t.nid = n.nid
JOIN dr_taxonomy_term_data d ON t.tid = d.tid
WHERE n.type = 'blog'
AND b.revision_id = n.vid
AND l.source = CONCAT( 'node/', n.nid )
GROUP BY n.nid
SQL
#
# Main script
#
def self.process(dbname, user, pass, host = 'localhost')
db = Sequel.mysql(dbname, :user => user, :password => pass, :host => host, :encoding => 'utf8')
drupal_path = "drupal_redirect"
FileUtils.mkdir_p drupal_path
File.open("#{drupal_path}/index.php", "w") { |f| f.puts permanent_redirect_to ("") }
FileUtils.mkdir_p "_posts"
FileUtils.mkdir_p "_drafts"
db[QUERY].each do |post|
# Get required fields and construct Jekyll compatible name
node_id = post[:nid]
title = post[:title]
# complete relatives URLs
content = post[:body].gsub("\"/sites/default/files/", "\"#{OLD_DOMAIN}/sites/default/files/")
created = post[:created]
tags = post[:tags].downcase.strip
drupal_slug = post[:slug]
time = Time.at(created)
is_published = post[:status] == 1
dir = is_published ? "_posts" : "_drafts"
slug = title.strip.downcase.gsub(/(&|&amp;)/, ' and ').gsub(/[\s\.\/\\]/, '-').gsub(/[^\w-]/, '').gsub(/[-_]{2,}/, '-').gsub(/^[-_]/, '').gsub(/[-_]$/, '')
name = time.strftime("%Y-%m-%d-") + slug + '.md'
# Get the relevant fields as a hash, delete empty fields and convert
# to YAML for the header
data = {
'layout' => 'post',
'title' => title.to_s,
'created' => created,
'comments' => true,
'categories' => tags
}.delete_if { |k,v| v.nil? || v == ''}.to_yaml
# Write out the data and content to file
File.open("#{dir}/#{name}", "w") do |f|
f.puts data.gsub /categories:\W+(.+)/, 'categories: [\1]'
f.puts "---"
f.puts content
end
#
# Make a file to redirect from the old Drupal URL
# Make refresh dirs & files according to entries in url_alias table in drupal 7+
# Copy the content to your drupal_redirect to the root of your all blog and that's it. It will redirect to your new site.
#
if is_published
ddir = "#{drupal_path}/#{drupal_slug}"
FileUtils.mkdir_p ddir
File.open("#{ddir}/index.php", "w") do |f|
puts "#{ddir}/index.php"
f.puts permanent_redirect_to "blog/#{time.strftime("%Y/%m/%d/") + slug}"
end
end
end
# TODO: Make dirs & files for nodes of type 'page'
# Make refresh pages for these as well
end
end
end
#
# PHP code to redirect permanently your users from your old website to your new
#
def permanent_redirect_to(location)
"<?php\nheader(\"HTTP/1.1 301 Moved Permanently\");\nheader(\"Location: #{NEW_DOMAIN}/#{location}\");\necho \"redirecting...\"\n ?>"
end
Jekyll::Drupal.process(ENV['DRUPAL_DATABASE'],ENV['DB_USER'],ENV['DB_PASSWORD'])

Replace the place holders with your actual values:

  • OLD_DOMAIN
  • NEW_DOMAIN
  • ENV[‘DRUPAL_DATABASE’]
  • ENV[‘DB_USER’]
  • ENV[‘DB_PASSWORD’]

After you run it, it will generate 3 folders:

  • _post: has all your post in the Jekyll style (categories and tags and everything)
  • _draft: not published posts if any
  • drupal_redirect: for each url of your posts it has a folder with a redirect index.php file to your new domain.

Copy each of this folder to their respective places. Copy the content to your drupal_redirect to the root of your old blog and that’s it. It will redirect all your all blog URLs to your new site.

Behind the scenes…

First, you need to extract the data from your Drupal site. I reversed engineer the database in order to extract the post, title, url alias (slug), tags, publish info, format and the last version of the post. The query that does all the magic is the following one:

Drupal 7 Query to extract all the post info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT
n.nid,
n.title,
n.created,
n.changed,
b.body_value AS 'body',
b.body_summary,
b.body_format,
n.status,
l.alias AS 'slug',
GROUP_CONCAT( d.name SEPARATOR ', ' ) AS 'tags'

FROM url_alias l, node n
JOIN field_data_body b ON b.entity_id = n.nid
JOIN taxonomy_index t ON t.nid = n.nid
JOIN taxonomy_term_data d ON t.tid = d.tid

WHERE n.type = 'blog'
AND b.revision_id = n.vid
AND l.source = CONCAT( 'node/', n.nid )

GROUP BY n.nid

As might notice, it concatenates all the tags separated by comma and also finds the alias of the url if is called node. Also you can also find the url alias for other pages such as terms or taxonomies. But let’s keep it simple and get the posts urls.

Finally, the script will use the data from this query to generate the new posts files and also to create the redirect files.

As might notice, it concatenates all the tags separated by comma and also finds the alias of the url if is called node. Also you can also find the url alias for other pages such as terms or taxonomies. But let’s keep it simple and get the posts urls.

Finally, the script will use the data from this query to generate the new posts files and also to create the redirect files.

Troubleshooting

I had a hard time having the mysql gem work with seqel in my Mac OS X 10.7 (Lion) and ruby 1.9.2.

I got the following errors:

  • Library not loaded: libmysqlclient.18.dylib (LoadError) Sequel::DatabaseConnectionError: Mysql::ClientError::ServerGoneError: The MySQL server has gone away mysql2 ruby
  • “LoadError: require ‘mysql’ did not define Mysql::CLIENT_MULTI_RESULTS!”
  • “You are probably using the pure ruby mysql.rb driver, which Sequel does not support. You need to install the C based adapter, and make sure that the mysql.so file is loaded instead of the mysql.rb file.”
  • Sequel::AdapterNotFound: LoadError: require ‘mysql’ did not define Mysql::CLIENT_MULTI_RESULTS! You are probably using the pure ruby mysql.rb driver, which Sequel does not support. You need to install the C based adapter, and make sure that the mysql.so file is loaded instead of the mysql.rb file.
  • And others…

Solution:

The mysql gem have been abandoned, so you also need mysql2 to work propery with sequel

bash Install MySQL gems
1
2
3
$ sudo gem install sequel
$ sudo gem install mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ sudo gem install mysql2 -- --with-mysql-config=/usr/local/mysql/bin/mysql_config

also you need to copy the following lib:

Reference needed libs
1
$ sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

That should work.

Just if you are courious there is another gem called ruby-mysql, with which you can connect to mysql. But it doesn’t work with sequel

Alternative gem to connect to mysql (ruby-mysql)
1
2
3
4
$ gem install ruby-mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
$ irb
> require 'mysql'
> db = Mysql.real_connect("SERVER","USER","PASSWORD","DATABASE")