Embedding the interpreter allows you to write Apache handlers or modules entirely in Perl. You can even configure the server using Perl code. And all existing modules on CPAN (or elsewhere) are available to you during this process.
Also, mod_perl can greatly increase the speed of pages dynamically created by CGI scripts or other means.
For instance, a current module (mod_dav -- Apache modules are traditionally known as mod_<module name>) implements an IETF specification of Distributed Authoring and Versioning. (See LINKS for the specification.) DAV in its current version is a sort of web-enabled CVS, allowing many people to work on documents at the same time and have a centralized server manage changes. Microsoft has chosen to use DAV to allow Office 2000 users to create a sort of groupware. While their intention was to tie Office 2000 closely to their Exchange server product, mod_dav will enable Apache to serve them as well.
Other modules exist to enable Java servlets, authenticate via any number of methods (Samba, Kerberos, LDAP, RADIUS, MySQL database...), log requests to external databases, implement a version of the Cold Fusion markup language, utilize server-side Javascript and much more. See LINKS for a link to the complete listing in the module registry.
The Apache API defines a series of phases that every request goes through. You can write a handler for any of these phases and have your program take care of that phase.
Here is a list of phases. This list is paraphrased from the document Apache API Notes by Robert Thau, which you can get from the URL in LINKS.
module cgi_module = {
STANDARD_MODULE_STUFF,
NULL, /* initializer */
NULL, /* dir config creator */
NULL, /* dir merger --- default is to override */
make_cgi_server_config, /* server config */
merge_cgi_server_config, /* merge server config */
cgi_cmds, /* command table */
cgi_handlers, /* handlers */
translate_scriptalias, /* filename translation */
NULL, /* check_user_id */
NULL, /* check auth */
NULL, /* check access */
type_scriptalias, /* type_checker */
NULL, /* fixups */
NULL, /* logger */
NULL /* header parser */
};
the important stuff being the comments to the right. The stuff I've skipped I haven't yet got :)
So if you wanted to log all requests for files in the /data
directory to a special file, you'd write a handler that stepped in at the logging phase. You'd check the request directory and if it matched up to /data you'd write the request to your special file.
Want to see what modules are compiled into your Apache? Type:
httpd -l (or ./httpd -l as necessary). Doing so on my development system gives me:
Compiled-in modules:
http_core.c
mod_env.c
mod_log_config.c
mod_mime.c
mod_negotiation.c
mod_include.c
mod_autoindex.c
mod_dir.c
mod_cgi.c
mod_asis.c
mod_imap.c
mod_actions.c
mod_userdir.c
mod_alias.c
mod_access.c
mod_auth.c
mod_setenvif.c
mod_auth_mysql.c
mod_perl.c
Of this listing, only mod_auth_mysql (a module to allow authentication from a MySQL database) and mod_perl were added by me. The rest of them were included with Apache itself.
Note that dynamic modules, a feature of Apache 1.3 on both Unix and Win32 systems, will not be listed in this listing since they are not compiled into the program itself.
An important note is that each child is distinct from the other children -- sharing information directly among the children is generally a no-no. This becomes important when you think of a user hitting a website multiple times, say for browsing a database. The user may not get the same child process for each request. Therefore, we cannot store state information in a child process, so using a separate data store (such as some form of database) is necessary.
modules/by-module/Apache
subdirectory. You'll also find lots of other modules to use with mod_perl there.
The latest version is 1.16_02, although 1.17 will be out in the next few weeks. Apache 1.3 is strongly recommended for use with mod_perl, although I believe it will still work with 1.2.
mod_perl must be compiled along with Apache. (Kind souls compile Win32 versions from time to time.) Be sure to read the INSTALL file when you unpack the module. It gives you very detailed instructions on how to install mod_perl.
If you're just playing around and experimenting, I recommend you install support for all the phases listed above. To do so, startup the process with:
perl Makefile.PL EVERYTHING=1
Once you're done, you'll have a new httpd binary in the Apache source tree. Note the size: for a normal httpd you'd expect 400K or so; the site of a mod_perl httpd can run over 1MB.
Run your new httpd just as you would a normal Apache binary.
Note that the size of the new httpd while running can grow depending on the modules you're loading into memory. I routinely see httpd children in excess of 5 MB. Keeping 10-15 children around means that 50-75+ MB of memory is necessary just for the web server. If you run out of physical memory and start swapping to disk, you might as well kill the server and allocate fewer children because the performance will be awful. Installing more memory generally takes care of severe performance problems.
Note that the name of the mod_perl handler used when configuring the server is in (parentheses).
# Modules to load at startup. PerlModule Apache::DBI PerlModule Apache::AuthenDBI PerlModule Apache::AuthzDBI PerlModule CGI
You can load up to 10 modules this way. If you need more than 10, use the PerlRequire directive to include a file which itself uses the modules.
# Takes care of URL rewriting PerlTransHandler Apache::MySite_Redirect
The directive PerlTransHandler tells mod_perl that we want the package Apache::MySite_Redirect to handle URL rewriting. You can also specify a subroutine name:
# Takes care of URL rewriting PerlTransHandler Apache::MySite_Redirect::url_modify
Here's another set of directives where we restrict the directive to a particular location.
# Where we keep all the scripts to make up each # page. Apache::Registry should cache them, # making them go lickety-split! <Location /page> SetHandler Perl-script PerlHandler Apache::Registry Options +ExecCGI </Location>
(We'll discuss Apache::Registry below.)
Apache::Registry is a replacement for CGI that allows your CGI scripts to be cached in
memory, making them run extremely fast, about as fast as a static page
request.
To enable Apache::Registry, put the following lines of code in your .conf file:
Alias /cgi-bin /usr/local/httpd/cgi-bin/mysite <Location /cgi-bin> SetHandler perl-script PerlHandler Apache::Registry </Location>
mod_perl will then cache your CGI scripts in memory as it encounters them.
This can have a huge performance increase, but there are also a number of
traps. CGI scripting can encourage messy programming -- since your program
will only be around for one instance, why bother using strict and similar checks? However, with mod_perl your program can be around for
some time, so you can run into problems with incorrectly initialized
variables, data structures that hang around past their lifetime, and so
forth. The mod_perl documentation has some help on this issue.
Apache::DBI exists to cache database connections on a per-child basis. As mentioned
earlier, sharing information (including a database connection) among the
children can be difficult, to say the least.
So upon a child process startup, this module will register itself with
mod_perl. Any successive calls to DBI's connect
method will get re-routed to Apache::DBI, which maintains a series of database connections. Each connection is
distinguished by its unique data source name (DSN -- generally the driver
name combined with the database you're connecting to), so when a call comes
in for that DSN Apache::DBI doesn't bother making the actual connection but instead hands off the
already established connection.
Everything else should work exactly the same. You should ensure that on busy websites your database can handle the number of connections this can generate.
It's not anything earth-shattering, but will hopefully give you an idea of what mod_perl can do.
You can view the site at:
http://www.ctaa.org/
Every page uses server-side includes. A server-side include is a snippet of HTML which the server parses and replaces with other information -- the user only sees the text the server puts in place of the SSI directive. Examples include a last-modified date, a common item of HTML included in many pages (e.g., navigation bar) or a hit counting program.
<!--#set var="CTAA_LOC" value="ct"--> <!--#set var="CTAA_MENU" value="ct"-->
This sets environment variables for this page. When later scripts generate the menu and other items, they peek into these variables to see what should be generated.
sub show_area_graphic {
my $current_area = lc shift;
my $path_to_images = "$SERVER_ROOT/images/cycled/";
# The absolute URL to the images.
my $url_to_images = '/images/cycled/';
# Get all relevant images
opendir( IMAGES, "$path_to_images" );
my @graphics = grep /^$current_area\d*\./, readdir(IMAGES);
closedir(IMAGES);
# If there are no images in this area, return nothing
# so we don't see that ugly 'no image found' graphic.
if ( ! $graphics[0] ) {
return '';
}
# Since all images within an area must have
# the same width and height, put that along with
# the text for the ALT tag here.
...clipped...
# Generate a random number between 0 and the highest
# banner number, plus .999 to ensure the highest number
# can also be picked.
my $this_img = int ( rand ($#graphics + 1 ) ) ;
my ( $width, $height ) = Image::Size::imgsize( $path_to_images . $graphics[ $this_img ] );
# Create the URL
my $this_img_url = $url_to_images . $graphics[ $this_img ];
# Return the HTML for this image.
return qq(<P ALIGN="CENTER">\n) .
qq(<IMG SRC="$this_img_url" ) .
qq(ALT="$defs{ $current_area }->{alt}") .
qq( WIDTH="$width" HEIGHT="$height" BORDER="0">) .
qq(</P>\n);
}
It's pretty simple.
In a central location I have the names of the images that appear along the left (the buttons). To modify the menu items, I just need to make a change in this central location. All successive pages will then be modified.
<!--#set var="SEE_ALSO" value="CODE"--> <!--#perl sub="Apache::Include" arg="/page/page_see_also.pl"-->
First we set the environment variable SEE_ALSO and call the stub routine to include the information. If SEE_ALSO is blank or defined as 'CODE', the routine returns nothing. If it has a code, then the routine parses a couple text files of information and places the proper links in a shaded box. Web-based tools exist for assigning links to a topic-based code, grouping links and reusing them in different areas.
Since we were outputting the 'Feedback' button dynamically, we figured we might as well do the entire footer navigation bar so we could change items easily if needed.
CTAA::PagePieces.pm. The SSI directives used to call the routines look like this:
<!--#perl sub="Apache::Include" arg="/page/page_side_menu.pl"-->
The .pl files in the /page location are just stubs to parse through the
environment variables and call the routines in
CTAA::PagePieces.pm. I included the option to get the menu, area and URI from elsewhere for
testing purposes.
#!/usr/bin/perl
use strict;
use CTAA::PagePieces;
{
my $current_menu = lc $ENV{CTAA_MENU} || shift @ARGV;
my $current_area = lc $ENV{CTAA_LOC} || shift @ARGV;
my $current_uri = lc $ENV{DOCUMENT_URI} || shift @ARGV;
print CTAA::PagePieces::show_side_menu( $current_menu,
$current_area,
$current_uri );
}
Apache::AuthenDBI and Apache::AuthzDBI
to authenticate and authorize users from a MySQL database.
The .conf code looks like this:
Simple Authentication
# Authorization for CTAA Services <Location /cgi-bin/valid> AuthName "CTAA Services" AuthType Basic PerlAuthenHandler Apache::AuthenDBI PerlSetVar Auth_DBI_data_source 'dbi:mysql:CTAA' PerlSetVar Auth_DBI_username 'myuser' PerlSetVar Auth_DBI_password 'mypass' PerlSetVar Auth_DBI_pwd_table 'Users' PerlSetVar Auth_DBI_uid_field 'Username' PerlSetVar Auth_DBI_pwd_field 'Password' require valid-user
# If they get an authorization required error, # direct users to the User Registration page. ErrorDocument 401 /cgi-bin/users.cgi?Action=BadLogin </Location>
Authentication with Authorization
# Same authentication as above, but # we add an AuthzHandler which ensures # that the user is a member of one or # more groups who are able to access # the CTAA Admin stuff. # # Note that this should match both the /cgi-bin/admin # and /admin URLs (as well as /ct/admin , /ntrc/admin , # etc.) <LocationMatch "admin"> Options +ExecCGI DirectoryIndex home.shtml home.html home.htm AuthName "CTAA Administration" AuthType Basic PerlAuthenHandler Apache::AuthenDBI PerlAuthzHandler Apache::AuthzDBI PerlSetVar Auth_DBI_data_source 'dbi:mysql:CTAA' PerlSetVar Auth_DBI_username 'myuser' PerlSetVar Auth_DBI_password 'mypass' PerlSetVar Auth_DBI_pwd_table 'Users' PerlSetVar Auth_DBI_uid_field 'Username' PerlSetVar Auth_DBI_pwd_field 'Password' PerlSetVar Auth_DBI_grp_table 'UsersGroups' PerlSetVar Auth_DBI_grp_field 'Groupname' require group webadmin ErrorDocument 401 /admin_only.shtml </LocationMatch>
The documentation for the authentication/authorization modules tell you which variables you need to set via the PerlSetVar directive.
# Log our html files to the database (neat!) PerlSetVar INTES_VHOST 'www.ctaa.org' PerlLogHandler Apache::INTES_LogDBI
The logging routine reads the variable INTES_VHOST and modifies its entry to the database accordingly. Here's the actual module -- shamelessly swiped from Lincoln Stein:
package Apache::INTES_LogDBI;
use Apache::Constants ':common';
use strict;
use vars qw/
$dbh $sth
/;
use DBI;
use POSIX 'strftime';
my $DSN = 'DBI:mysql:WebStuff';
my $db_user = 'myuser';
my $db_passwd = 'mypass';
my $log_table = 'WebLogs';
$dbh = DBI->connect( $DSN, $db_user, $db_passwd );
my $sql = qq/
INSERT INTO $log_table
VALUES ( ?,?,?,?,?,
?,?,?,?,? )
/;
$sth = $dbh->prepare( $sql );
sub handler {
my $r = shift;
my $url = $r->uri;
return DECLINED if ( $url !~ /htm(l)?$/ );
my $date = strftime( '%Y-%m-%d %H:%M:%S', localtime );
my $host = $r->get_remote_host;
my $method = $r->method;
my $user = $r->connection->user;
my $referer = $r->header_in( 'Referer' );
my $browser = $r->header_in( 'User-agent' );
my $status = $r->status;
my $bytes = $r->bytes_sent;
my $vhost = $r->dir_config( 'INTES_VHOST' );
$sth->execute( $date, $host, $method, $url, $user,
$browser, $referer, $status, $bytes, $vhost );
return OK;
}
1;
Home: http://perl.apache.org/
Modules: http://perl/apache.org/src/apache_modlist.html
FAQ: http://perl.apache.org/faq/
Mailing List Archives: http://forum.swathmore.edu/epigone/modperl
Find the mailing list address at the mod_perl home or in the archives. It shouldn't be too easy to fill up the mailbox of so many people :)
Issue 9: Stately Scripting with mod_perl
A good introduction to mod_perl and the source of the handler listing (along with lots of other good ideas).
Issue 11: A Web Proxy Module for mod_perl
Lincoln shows how to setup Apache as a proxy server and allow mod_perl to step in and alter content -- in this case, strip banner ads from every page going through the proxy.
API: http://www.apache.org/docs/misc/API.html
Modules: http://modules.apache.org/
http://www.lyra.org/greg/mod_dav/
RFCs: http://www.rfc-editor.org/rfc.html
IETF Working Group Pages: http://www.ics.uci.edu/pub/ietf/
Chris Winters cwinters@intes.net