sql

Using QueryPath to interact with a SQL database, Part 3

This is the third part in a series on how to use QueryPath to work with databases. In this part, we will see how the QPDB extension to QueryPath can be used to extract data from an XML or HTML document and store it in a database.

In the first part of this series, we looked at the basics of QueryPath and databases. There we connected to a database and created some new tables.

In the second part we took a detailed look at the process of retrieving data from the database and then using that data within QueryPath. Specifically, we selected data from a database and then merged it into existing documents and templates.

Now we will look at the opposite process. We will extract data from an XML or HTML document and insert it into a database.

Importing Data

The focus of these articles has thus far been on taking information from a database and putting it into an HTML document. But QueryPath can be used the other way. It can query a document and place that data into a database. In this article, we will see how.

Creating Tables

Our example document will be an RSS 2.0 feed. First, let's set up a very simple database for storing information. We will need two tables: channel to store information about a channel and item to store information about each item in a feed.

The two tables will be created with SQL like this:

CREATE TABLE IF NOT EXISTS channel
  (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link);
CREATE TABLE IF NOT EXISTS item 
  (channel_id INTEGER, guid, title, link, description);

Keep in mind that we are working with SQLite. Other databases, such as MySQL, will have different data definition syntax.

These are both very minimalist tables. Only a fraction of the information in the original RSS document is going to be stored here. But for our purposes, this will do. The channel table will contain three fields: channel_id, which is just an auto-incremented integer, title and link. These last two will come from the RSS channel information.

The second table, item, will be used to store an RSS item. Its channel_id field will be used to tie an item back to the original channel. The other fields will map to information from the RSS feed.

As we did before, we can execute these queries inside of QueryPath like this:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
 
 
qp()
  ->query('CREATE TABLE IF NOT EXISTS channel (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link)')
  ->query('CREATE TABLE IF NOT EXISTS item (channel_id INTEGER, guid, title, link, description)');
exit;
?>

Now we have our tables ready.

Inserting Channel Information

Once we have some tables defined we can set up a simple tool for reading the remote RSS feed and injecting the contents into our newly defined tables.

Let's begin with a simple script that adds a channel to the database:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
 
$url = 'http://localhost:8888/drupal/rss.xml';
$qp = qp($url);
 
$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();
$qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values);
?>

This short script does a lot of things. After requiring the two necessary libraries and setting the database connection, the script defines the URL ($url) that points to the RSS feed.

Next, a new QueryPath object is created and stored in $qp. As soon as we pass QueryPath the URL, it will retrieve and parse the remote document. In this case, the document is a standard RSS 2.0 feed.

Our plan is to query that RSS document and then store the retrieved information in our newly defined database. In the code above, we first retrieve the two values we want, storing them in $values. Note that these are keyed using the PDO placeholder syntax (:title and :link).

$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();

Next, we use $qp->query() to insert the results into the database, passing in $values as the substitution array.

By the end of the script above, the main channel data will be written to the database.

Getting an Insert ID

One thing we often need to do after an insert is get the auto-generated ID back from the database. Most databases support this operation. In our application we need to do this because channel_id, our primary key, is a parameter that we will need when inserting into the next table.

In QueryPath 1.2, retrieving the last insert ID required just a little bit of fancy footwork. It was done like this:

<?php
$channel_id = $qp->getBaseDB()->lastInsertId();
?>

This essentially gets the PDO database connection (with getBaseDB()) and then calls the PDO function lastInsertId() on that.

In QueryPath 1.3, this has been streamlined slightly, and you will be able to do something like this:

<?php
$channel_id = $qp->getLastInsertID();
?>

Now that we have the last insert ID, we have all we need to iterate through the RSS document and add items to our item table.

Iterating through Matches and Inserting Data

An RSS document has multiple item elements. What we want to do is iterate through those items and store some of the contained data inside of our newly defined database.

Let's continue building on the previous examples. This may make our code snippet longer, but it will help keep things in context.

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
 
$url = 'http://localhost:8888/drupal/rss.xml';
$qp = qp($url);
$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();
$qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values);
 
// Get the channel ID
 
// In QueryPath <= 1.2:
//$channel_id = $qp->getBaseDB()->lastInsertId();
 
// In QueryPath 1.3+
$channel_id = $qp->getLastInsertID();
 
$sql = 'INSERT INTO item (channel_id, guid, title, link, description) 
  VALUES (:channel_id, :guid, :title, :link, :description)';
 
foreach($qp->find('item') as $item) {
  $data = array(
    ':channel_id' => $channel_id,
    ':title' => $item->children('title')->text(),
    ':link' => $item->siblings('link')->text(),
    ':description' => $item->siblings('description')->text(),
    ':guid' => $item->siblings('guid')->text(),
  );
  // Notice that we execute from $qp, not $item.
  $qp->query($sql, $data)->doneWithQuery();
}
?>

The new code begins abut halfway down, where $sql is defined. That SQL template will be used for each of our inserts.

The foreach loop beneath is the most important part. Here, QueryPath iterates through each <item/> element in the RSS feed. Each item is wrapped in a QueryPath object assigned to $item.

Inside the loop, we first build up a list of values to insert. These are stored in $data. Since $data is going to used to substitute values into the SQL, we use the placeholder syntax, :channel_id, :title, and so on.

The $channel_id will be the same for every item, since they all belong to the same RSS channel. All other values come from the RSS item, and so we query the item to find the title, link, description, and GUID.

Finally, we perform the INSERT:

$qp->query($sql, $data)->doneWithQuery();

As a slight performance optimization, we run the query using the $qp object instead of the $item object.

That's all there is to our RSS importing script. Now our new database should contain information retrieved from the RSS feed.

Retrieving the Data

Now that we have the data in the database, we can retrieve it using the methods we studied in the previous part of this series. For example, let's add one more part to our script: Let's dump a list of item titles into an HTML document.

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
 
qp(QueryPath::HTML_STUB, 'body')
  ->query('SELECT title FROM item')
  ->withEachRow()
  ->appendColumn('title', '<p/>')
  ->doneWithQuery()
  ->writeHTML();
?>

This little snippet of code will select all titles from the item table. Then each title will be wrapped in a paragraph and added to the stub HTML document. Finally, the document will be written out. This will produce HTML that looks like this:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
       <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
     <title>Untitled</title>
    </head>
    <body>
      <p>Chicago Bears players warm up to Jay Cutler</p>
      <p>Broncos trade Cutler to Bears</p>
      <p>Thanks to benefactors, fountain will offer summer respite</p>
      <p>Learning Drupal 6 Module Development</p>
      <p>Story with lots of tags</p>
      <p>A Sticky Node</p>
      <p>De natura mentis humanae: quod ipsa sit notior qu&agrave;m corpus.</p>
      <p>De iis quae in dubium revocari possunt.</p>
      <p>This is a node</p>
    </body>
    </html>

At this point in the article, we have created some tables, read a remote document into these tables, and then retrieved data back out of the tables.

Conclusion

This is the final article in our three-part series on using QueryPath to work with databases. In the first article, we performed basic database tasks like connecting and running basic queries. The second article focused on using SELECT to retrieve data, and also using QueryPath's built in tools for working with the returned results. This third part has focused on using QueryPath's tools to retrieve information from an XML document, and then store that information inside of a database.

The major features of QueryPath's QPDB extension have been explained and used in this series, but there is still more. You can, for example, share a database connection with non-QueryPath tools, or work more directly with the PDO result set. But this series has, I hope, equipped you to make use of the most commonly used QPDB features.

Using QueryPath to interact with a SQL database, Part 1

QueryPath is designed to make working with XML and HTML simpler. It can also be used to interact with relational databases. Through the QPDB extension, which is bundled with QueryPath, QueryPath can execute queries and work with them in the context of an XML or HTML document. This series of articles provides an overview of this capability.

In this first part, we will look at connecting to a database and executing basic SQL statements. In the second part we will look at executing SELECT statements and managing the results from within QueryPath.

Background

Before we dive into some examples, a little bit of background is in order.

The QP database tool works with most of the databases that PHP 5 supports. To be more specific, it works with any PDO driver that ships with PHP 5. This includes MySQL, PostgreSQL, and SQLite. PDO (PHP Data Objects) is a driver abstraction layer that provides a standard set of tools for working with a wide variety of databases (there are at least ten drivers). It provides the perfect fit for QueryPath.

In this document, we are going to connect to a SQLite database. With just minimal changes to the connection string and DDL, the examples here can be made to run on MySQL or PostgreSQL.

Other Database Libraries

The QPDB extension uses PDO. However, it is possible to write other QueryPath extensions that use different underlying database libraries. For example, the Drupal QueryPath module includes a database layer for working with the Drupal database API.

A First Example: Connecting to a database and defining a table

The first example we are going to look at will connect to a SQLite database and create a new table. Here's the code:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
qp()->dbInit('sqlite:./qpExample.db')
  ->query('CREATE TABLE IF NOT EXISTS person (firstName, lastName)');
?>

The first thing to note about this example is that we require not only the QueryPath.php library, but also the QueryPath/Extension/QPDB.php extension. This is how extensions are loaded with QueryPath. As soon as the extension is loaded, it becomes available for use.

Next, we have a single QueryPath chain which initializes the database and then runs a SQL query. There are two database methods used:

  • dbInit()
  • query()

Let's start by looking at dbInit() and other ways of initializing a database connection.

Initializing the database

The dbInit() method provides one way of connecting to a database. Like any PDO connection, dbInit() takes a connection string called a DSN. The one above is for SQLite.

A DSN begins with the database name followed by a colon. From there, the rest of the string is determined by the PDO driver. Here are a few examples:

  • SQLite: sqlite:/path/to/database.db
  • MySQL: mysql:host=example.com;dbname=example
  • PostgreSQL: pgsql:host=Example.com dbname=example
  • IBM DB2: ibm:DSN=DB2_9

The other six PDO drivers each have their own DSN formats as well. Ours uses the SQLite database driver and opens the database file called qpExample.db.

There are actually two ways of connecting to a database. The first is to use dbInit(), as we have done above. This is a per object connection, meaning that each QueryPath object must open its own connection to the database.

A second method shares one database connection across multiple QueryPath instances. This is called a shared database connection, and it is the preferred method anytime your code is doing more than a couple of queries.

Some other database extensions for QueryPath (such as the Drupal database extension) do not require database connection management, since that is provided by some other library.

Per-object Connections

The dbInit() method will connect the current QueryPath object to the database. That means that each QueryPath object that needs to execute database queries will need to initialize, first.

Shared Database Connections (the preferred method)

There is a second way of initializing QueryPath, and this second way makes it possible to share one connection across multiple QueryPath objects. It is the preferred method in any environment where QueryPath will be making frequent use of a database.

In this method, the database connection is opened outside of an individual QueryPath object. We could rewrite our initial example using a shared connection:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
qp()->query('CREATE TABLE IF NOT EXISTS person (firstName, lastName)');
?>

In this case, we use QPDB::baseDB() to open a connection to the database. This connection is then referenced internally by the QPDB extension, and there is no need to use dbInit().

Note that while SQLite requires only one argument to open a database connection, other databases (including MySQL and PostgreSQL) may require other options, such as login credentials. These can be passed to the underlying database driver using a second argument, an associative array of information. The API documentation for QPDB::baseDB() lists all options.

That's how to connect. Next, we will look at functions to query the database.

<

h3>Querying the Database

In our examples above, we created a new table using code that looked like this:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
qp()->query('CREATE TABLE IF NOT EXISTS person (firstName, lastName)');
?>

In the last line of code, some SQL is executed using the query() method. In a nutshell, this method executes a SQL statement.

In the example above, we use the query() method to execute a CREATE TABLE statement. This will create for us a new table in the database -- a table named person with a firstName and a lastName column.

Let's say that we want to add an entry to the table. We could do so by adding another query to the code above:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';
 
QPDB::baseDB('sqlite:./qpExample.db');
qp()->query('CREATE TABLE IF NOT EXISTS person (firstName, lastName)')
  ->query(
    'INSERT INTO person (firstName, lastName)  VALUES (:first, :last)', 
    array(':first' => 'matt', ':last' => 'butcher')
  );
?>

The second query above, which is split over four lines because of our limited margins, is a simple INSERT statement. It inserts a first name
and a last name into the person table.

Take a close look at the VALUES portion of the Query:

INSERT INTO person (firstName, lastName)  VALUES (:first, :last)

The two values, :first and :last are placeholders. They will be replaced by the values in the passed-in array:

array(':first' => 'matt', ':last' => 'butcher')

When the query above is executed, :first and :last will be replaced by matt and butcher. Underneath the hood, what is actually happening here is that the basic query is being prepared, which means that the database engine is pre-compiling the SQL statement. Once the statement is prepared, the data can be inserted in a way that does not leave the database susceptible to SQL injection attacks.

IMPORTANT: The placeholders are not enclosed in quotation marks, even if they represent strings. The database driver automatically escapes and encodes data as necessary. This added layer of automation is another reason why using PDO-based database drivers can improve the security of your application.

Now we have opened a connection to a database, created a table, and added some data. In part two we will see what we can actually do with this data! We will be executing SELECT statements and then managing them with QueryPath.

In part three we will see how to read XML or HTML and insert the found data into a database.

Syndicate content