Perl and Databases

This discussion will focus on Perl's DBI, or Database Interface. The DBI has been in development for over five years and just recently achieved v. 1.0. Its version number, however, belies its maturity. Perl developers have used the DBI for small applications to enormous ones.

A necessary tool for the DBI is one or more DBDs, or Database Drivers. In fact, when I refer to DBI throughout this discussion, I will actually be referring to the two in tandem, since you cannot do much with just the DBI.

The DBI defines the interface through which you can talk to the individual DBDs, which are often implemented in C using the client libraries distributed with a SQL database. In theory, an application written using one DBD can be ported to another DBD with minimal effort, sometimes as little as changing one line.

For the Win32 folk: The relationship between the DBI and DBD is somewhat like that between the ODBC Driver Manager and the ODBC Drivers.

If you already know how to use the DBI, this probably won't teach you anything new. If you're just starting to use the DBI, this might be helpful to have while you're reading the DBI documentation, which is fairly straightforward and very comprehensive. Hopefully this will help flesh out some of the information found in the DBI documentation.


Terminology

Let's define a few terms here:

Installing

The DBI is fairly simple to install. I have never had a problem installing it. Same steps as every other module from CPAN: install the CPAN module and use its shell, or ftp it from the site and do:

  perl Makefile.PL
  make
  make test
  make install

Installing DBDs can be a mixed bag. While the DBI defines an interface, a DBD interacts directly with a database so all the difficulties that come with different platforms and different levels of application support exist in full force.

Generally, you'll need to know where your database was installed, along with its libraries and include files. I have installed the following DBDs without much difficulty:

 DBD::mSQL
 DBD::mysql
 DBD::Solid
 DBD::CSV
 DBD::XBase
 
(I also worked briefly with DBD::ODBC on Win32, but that came
with the GS port of 5.004 so it's in a different league.)

There are also some helper applications you'll need:

 Data::ShowTable
 SQL::Statement (for CSV and XBase)

Extremely helpful are:

 Term::ReadLine
 Term::ReadKey

which allow you to use the dbimon much more effectively.

Using

Now we get to the fun part -- using the DBI.

Connect

The first step is connecting to the database. All actions going to the database are done through a database handle.

Create a database handle like this:

   my $dbh = DBI->connect( DBI:mysql:test',
                           'username',
                           'password',
                           { RaiseError => 1 } );

where:

  DBI:mysql:test - Data Source Name (DSN)
  DBI            - DBI (necessary first part of the DSN)
  mysql          - Name of DBD
  test           - Name of database we are connecting to.
  username       - User we are connecting to this database as
  password       - Password for the user

The final argument to the connect call is a hashref of options. An example is:

  { RaiseError => 1, AutoCommit => 0 }

The DBI does not really care what statements you send through this database handle. You can think of DBI as a pass-through module: it defines an interface for sending requests to and receiving data from a database, but it does not really care what the statements or data are.

So if you're having problems with your SQL, the DBI is not the place tO look. As long as the data is getting to the database and you're getting messages back from it, the DBI is doing its job.

Now that you have a database handle, what can you do with it? You can use it directly, as we discuss below. But, like all objects in Perl, the $dbh you've now defined is very versatile. You can pass it around just like a reference:

 my @userlist = get_all_users( $dbh, $table_name );
 
 ...

sub get_all_users {
  my $dbh = shift;
  my $table_name = shift;
  ...
 }

You can also embed it in other objects:

 my $user = User->new( dbh => $dbh, username => $username );
 my $full_name = $user->full_name();

 package User;

 sub new {
  my $pkg = shift;
  my $class = ref( $pkg ) || $pkg;
  my %params = @_;
  my $self = {};
  $self->{dbh} = $params{dbh};
  $self->{username} = $params{username};
  return bless( $self, $class );
 }

 sub full_name {
  my $self = shift;
  if ( ! $self->{full_name} ) {
     my $sql = qq/
       SELECT FirstName, LastName
       FROM Users
       WHERE Username = $self->{usename}
     /;
     my $sth = $self->{dbh}->prepare( $sql );
     ...
  }
  return $self->{full_name};
 }

Selecting (part 1)

Now that we are connected, what can we do? Here is a typical operation, where we retrieve some characteristics of a User who has a lastname beginning with 'smith':

 # Define our SQL statement
 my $sql = qq/
   SELECT Username, FirstName, LastName, Phone
   FROM Users
   WHERE LastName LIKE 'smith%'
 /;

 # Create a statement handle so we can
 # retrieve the results
 my $sth = $dbh->prepare( $sql );

 # Execute the statement
 $sth->execute;

 # $sth->fetchrow_hashref returns a hashref for
 # every record that meets our criteria.
 #
 # When there are no more records, the assignment
 # to $row is undef and the while loop is done.
 while ( my $row = $sth->fetchrow_hashref ) {
    print "New User:\n";
    foreach my $field ( sort keys %{ $row } ) {
        print "$field: $row->{ $field }\n";
    }
 }

Why prepare and execute?

The idea of having separate prepare and execute statements become obvious when you get into binding parameters. Binding parameters allows you to prepare a statement once and execute it many times with different parameters. For instance, you have a text file formatted:

  ProductCode::Product::Price

and you want to put it into a table in a database.

 # Create the SQL statement with '?' where your 
 # values would normally be
 my $sql = qq/
   INSERT INTO Products
   ( ProductCode, ProductName, Price )
   VALUES
   ( ?, ?, ? )
 /;
 my $sth = $dbh->prepare( $sql );
 open( PRODUCTFILE, "products.txt" ) || die "Cannot open product file: $!";
 while ( <PRODUCTFILE> ) {
    chomp;
    my ( $code, $name, $price ) = split /::/;
    $sth->execute( $code, $name, $price );
 }
 close(PRODUCTFILE);

Inserting data this way can be several times faster than creating a separate SQL statement for each set of values. It is also elegant.

Simple statements

The DBI also allows the execution of simple statements very quickly:

 # Create the SQL statement
 my $sql = qq/
   UPDATE Users
   SET Phone = '202-456-1111'
   WHERE LastName = 'Smith'
 /;

 # Prepare and execute in one step
 my $rv = $dbh->do( $sql );

The return value from a do indicates whether the statement succeeded or not. A nonzero value indicates success and on many databases indicates how many rows were affected by the statement. If I had 4 entries in the Users table who had the last name of 'Smith', $rv in the above example would be 4.

If the statement had no error but affected no rows, you'll get back the value '0E0', which Perl evaluates to true, so you can do:

  if ( $dbh->do( $sql ) ) {
     ...continue on with next action...
  }
  else {
     ...error with $sql...
  }

Selecting (part 2)

Similar to binding parameters as discussed above is binding columns. You can bind columns to values selected and make your program much more readable. Here is an example:

 # Create the SQL statement
 my $sql. = qq/
   SELECT ProductCode, Product, Price
   FROM Products
   WHERE Price <= 3
 /;

 # Initialize our variables to be bound
 my ( $prod_code, $prod, $price );

 # Prepare and execute
 $sth = $dbh->prepare( $sql );
 $sth->execute;

 # Bind the columns. The first argument is a hashref of
 # attributes and each argument thereafter is a scalar
 # ref corresponding to the fields you select.
 $sth->bind_columns( undef, \$prod_code, \$prod, \$price );

 # Setup our report
 format PRODUCTS_TOP =
 Product Code      Product          Price
 ===========================================
 .
 format PRODUCTS =
 @<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<< @>>>>>>>>
 $prod_code,       $prod,           $price
 .
 $~ = 'PRODUCTS';

 # $sth->fetch is an alias for $sth->fetchrow_arrayref 
 while ( $sth->fetch ) {
    write;
 }

Selecting (part 3)

There are numerous ways to get your data back from the database:

fetchrow_arrayref
Returns an array reference with the values from the selected record. The values are in the order you used in your SELECT statement. This is the fastest method to select data from the database.
fetch
This is currently an alias for fetchrow_arrayref.
fetchrow_hashref
Returns a hash reference with the keys as fieldnames and the record's data as the values. The only warning here is to determine how your database returns fields. Some do so in ALL CAPS, others retain the original case of the field.
fetchrow_array
Returns an array with the values from the selected record.
fetchall_arrayref
Gets all the rows from the database for you, returning an arrayref with each member as an arrayref of values from the database.

Transactions

Most modern relational database systems are based around the idea of a transaction. A transaction is basically a set of related SQL statements that are commited to the database at one time. If any of the statements fail, the transaction is sent a rollback signal, which removes the changes any SQL statement in the transaction made.

The classic example for transactions is a bank: a customer wishes to transfer funds from one account to another. This requires two discrete statements: an addition to one account and a subtraction from the other.

What happens if the bank's power goes out after the first statement but before the second? In a database without transactions, you have one happy customer! But in a database with transactions, the database upon starting up will automatically rollback all uncommitted transactions.

In Perl, the above would look something like this (in a very abstracted sense):

 if ( add_funds( $dbh, $account, $amount ) and
      remove_funds( $dbh, $account, $amount ) {
        $dbh->commit;
 }
 else {
        $dbh->rollback;
 }

Error Handling

Error handling with the DBI is like error handling with everything else in Perl: always check your return codes. The DBI can make it easier with one of the attributes you can set when you first create your database handle:

 my $dbh = DBI->connect( ..., { RaiseError => 1 } );

the RaiseError attribute throws a die every time there is an error with your statement and sets $@ to the error returned by the database. You can then wrap your DBI calls in an eval{} and trap your errors:

 my ( $sth );
 my $sql = qq/
   SELECT FROM TableName
   WHERE Field = 'value'
 /;
 eval {
   $sth = $dbh->prepare( $sql );
   $sth->execute;
 };
 if ( $@ ) {
   print "Error in the database: $@";
   exit;
 }

After the eval{}, $@ probably has an error like: ``Parse error near 'FROM', no fieldnames for SELECT.''

The latest error string from the database can be found in: $dbh->errstr, and if you have a problem connecting to the database, you can retireve the error from $DBI::errstr. You can also retrieve errors from each statement handle: $sth->errstr.

Links

What would a document like this be without links?

Article from The Perl journal about the DBI
http://www.arcana.co.uk/technologia/perl/DBI/doc/tpj5/
DBI Mailing Lists
http://www.fugue.com/dbi/
SQL Tutorial
http://w3.one.net/~jhoffman/sqltut.htm
Advantages of a DBMS
http://www.compapp.dcu.ie/databases/f001.html
Ask the SQL Pro
http://www.inquiry.com/techtips/thesqlpro/

Author

Chris Winters (cwinters@intes.net) with help from the DBI documentation and various postings on the dbi_users mailing list. Comments appreciated!