How to populate a database
Overview
In the process of application development, developers are often faced to the problem of database population. If a few specific solutions exist for some database system, none can be used on top of the object relational mapping. Thanks to YAML and the sfPropelData object, symfony facilitates the batch processes that take data from a text source to a database.
Introduction
Imagine a project where registered users are stored in a User table with the following columns:
| User |
| id |
| name |
| login |
| hashed_password |
| email |
| created_at |
The User Propel object contains standard accessors. For this example project, a ->setPassword() method that sets the value of the hashed_password field from a password in clear is added to the myproject/lib/model/User.class.php class file:
class User extends BaseUser
{
...
public function setPassword($password)
{
$this->setHashedPassword(md5(password));
}
}
This method stores a MD5 hash of its argument, to avoid storing in clear a secret piece of data.
The id column is auto incremented.
If you want to enter some records in this table, for unit tests or in order to reinitialize the database of the test environment every day, you need to be able to use a text data file and to import it into the database.
Data file syntax
Symfony can read data files that follow a very simple YAML syntax. Under a header that defines the class of the records to be put in the database, a list of record is written, each record labeled by a unique string. For each record, a set set of fieldname:value is listed. For instance:
User:
bob_walter:
name: Bob Walter
login: bwalter
password: sarah34
email: bob.walter@foomail.com
peter_clive:
name: Peter Clive
login: pclive
password: gurzikso
email: peter.clive@foomail.com
The keys used in this file will be Camelized to find the appropriate setter (->setName(), ->setLogin(), ->setPassword(), ->setEmail()). This means that it is possible to define a password key even if the actual table doesn't have a password field because the ->setPassword() method exists in the User object.
The column id doesn't need to be defined, since it is an auto-increment field: the database layer knows how to determine it.
The created_at column isn't defined either, because symfony knows that fields named created_at have to be set to the current system time when created. This would be the same with an updated_at field.
Save this data in an import_data.yml file, in the myproject/data/fixtures/ directory.
Import Batch
Initial declarations
An import batch is a PHP script that reads the data from YAML files and transfers it to the database. Since it has to access the Propel classes and use the symfony configuration, such a script must start just like any front controller:
<?php
define('SF_ROOT_DIR', realpath(dirname(__FILE__).'/..'));
define('SF_APP', 'myapp');
define('SF_ENVIRONMENT', 'dev');
define('SF_DEBUG', true);
require_once(SF_ROOT_DIR.DIRECTORY_SEPARATOR.'apps'.DIRECTORY_SEPARATOR.SF_APP.DIRECTORY_SEPARATOR.'config'.DIRECTORY_SEPARATOR.'config.php');
sfContext::getInstance();
The definition of an application (myapp) and of an environment (dev) allows symfony to determine a configuration to be loaded. Once executed, this code allows the batch to access Propel objects, and to autoload any symfony class.
sfPropelData call
The sfPropelData is an object who knows how to read a YAML data file and use it to add records to a Propel-enabled database. It is very easy to use: you just need to call its ->loadData() method, passing as an argument a file or a directory that contains the data to be loaded into the database.
Create a load_data.php script in the myproject/batch/ directory starting with the previous lines, plus the following ones:
$data = new sfPropelData();
$data->loadData(sfConfig::get('sf_data_dir').DIRECTORY_SEPARATOR.'fixtures');
?>
To launch the script, just type in a command line:
$ cd myproject/batch
$ php load_data.php
This batch will put the two records labeled bob_walter and peter_clive in the database - provided that the database connection settings are properly entered in the myproject/apps/myapp/config/databases.yml configuration file.
Note: If you added a database connection or changed the default connection name ('propel') in your databases.yml, you will need to specify this connection name when you call the ->loadData() method. Use the second argument for that. For instance, if you want to populate a database using a connection named symfony, write:
[php]
$data->loadData(sfConfig::get('sf_data_dir').DIRECTORY_SEPARATOR.'fixtures', 'symfony');
Add or replace
By default, the ->loadData() method will not only add the records from the YAML files given as parameters, it will also erase the existing records in the tables that are modified. To change this behavior, you need to set the deleteCurrentData property to false before calling the ->loadData() method:
$data->setDeleteCurrentData(false);
To get the current value of the deleteCurrentData property, use the associated getter:
$status = $data->getDeleteCurrentData(false);
Linked tables
Adding records in one table is easy, but the things could get tricky with linked tables. Imagine that the users of your website can write posts in a blog-like interface. Your model would also have a Post table with more or less the following fields:
| Post |
| id |
| user_id |
| title |
| body |
| created_at |
| updated_at |
The problem is: how can you define the value of the user_id field if you have no idea of the id given automatically to the records of the User table?
This is where the labels given to the records become really useful. To add a post written by Bob Walter, you simply need to add to the myproject/data/fixtures/import_data.yml data file:
Post:
post01:
user_id: bob_walter
title: Today is the first day of the rest of my life
body: I have plenty to say, but I prefer to stay short. Enjoy.
The sfPropelData object will recognize the label that you gave to a user previously in the import_data.yml, and grab the primary key of the corresponding User record to set the user_id field. You don't even see the ids of the records, you just link them by their labels: it can't be simpler.
The only constraint to a link file is that the objects called in a foreign key have to be defined earlier in the file - that is, as you would do if you defined them one by one. The data files are parsed from the top to the bottom, and the order in which the records are written is then important.
Flat file vs. separated files
One data file can contain declarations of several classes. For instance, consider the following data file:
User:
bob_walter:
name: Bob Walter
login: bwalter
password: sarah34
email: bob.walter@foomail.com
peter_clive:
name: Peter Clive
login: pclive
password: gurzikso
email: peter.clive@foomail.com
Post:
test_post:
user_id: bob_walter
title: Today is the first day of the rest of my life
body: I have plenty to say, but I prefer to stay short. Enjoy.
another_test_post:
user_id: bob_walter
title: I don't know what to say today
body: As my previous post was so briliant, I find myself a little dry.
Comment:
enthusiast_comment:
user_id: peter_clive
post_id: test_post
body: Hey Bob, I'm so glad you finally found something interesting to say.
disappointed_comment:
user_id: peter_clive
post_id: another_test_post
body: Mate, you really disappoint me. I expected much more or your great potential.
It is getting quite long. Symfony allows you to cut the file in pieces, that will be parsed in the alphabetical order of their file names. For instance, you could separate this file into three ones, one for each class. To make sure that they are parsed in the correct order, prefix them with an ordinal number:
100_user_import_data.yml
200_post_import_data.yml
300_comment_import_data.yml
The same ->loadData() method call still works, since the parameter was a directory:
$data->loadData(sfConfig::get('sf_data_dir').DIRECTORY_SEPARATOR.'fixtures');
You can, if you wish, use only one of these files for the batch:
$data->loadData(sfConfig::get('sf_data_dir').DIRECTORY_SEPARATOR.'fixtures'.DIRECTORY_SEPARATOR.'100_user_import_data.yml');
Alternate YAML syntax
YAML has an alternate syntax for associative arrays that may make your data files more readable, especially for the tables that contain many foreign keys:
Comments:
c123:
user_id: u65
post_id: p23
body: first blabla
c456:
user_id: u97
post_id: p64
body: second blabla
The same can be written in a shorter way:
Comments:
c123: { user_id: u65, post_id: p23, body: first blabla }
c456: { user_id: u97, post_id: p64, body: second blabla }
For more information about the YAML syntax, refer to the YAML website
|