CSCI E-153, Web Development Using XML

December 7, 2010

Harvard University
Division of Continuing Education
Extension School

Course Web Site: http://cscie153.dce.harvard.edu/

Instructor email: david_heitmeyer@harvard.edu
Course staff email: cscie153@dce.harvard.edu

Wrapping Up

Grab Bag

Brown Bag

XSLT in the browser

Browsers do support XSLT. Include a "processing instruction" in the XML that tells the browser what XSLT to apply.

Link an XSL with XML with a processing instruction:

<?xml-stylesheet type="text/xsl" href="http://example.com/service-check.xsl"?>

Issues

XSLT in the Browser - Weather

xsl

XSLT in the Browser - XML Output

xsl

Differences of Meaning or Differences of Syntax?

Why?

Canonical XML


"diff" for XML

diff

file1.txt
Lorem
ipsum
dolor 
sit 
amet, 
consectetur 
adipiscing 
elit. 
Quisque 
magna.    
file2.txt
Lorem
lorem
ipsum
dolor 
amet, 
consectetur 
Adipiscing 
elit. 
Quisque 
magna.

diff:

$ diff file1.txt file2.txt
1a2
> lorem
4d4
< sit 
7c7
< adipiscing 
---
> Adipiscing 

Differences in XML

Two tools:


xmldiff

list1.xml
xmldiff

list2.xml
xmldiff

xmldiff
xmldiff

XML Security

Encryption Basics

Plaintext, Ciphertext, Key, Algorithm

Symmetric Key Cryptography

Symmetric Key Cryptography

Public Key Cryptography

Public Key Cryptography

Digital Signatures

Public Key + Signature

XML Encryption

Examples are from W3C: XML Encryption Syntax and Processing.

Encrypting character data:

<PaymentInfo xmlns='http://example.org/paymentv2'>
    <Name>John Smith</Name>
    <CreditCard Limit='5,000' Currency='USD'>
      <Number>
        <EncryptedData xmlns='http://www.w3.org/2001/04/xmlenc#'
         Type='http://www.w3.org/2001/04/xmlenc#Content'>
          <CipherData>
            <CipherValue>A23B45C56</CipherValue>
          </CipherData>
        </EncryptedData>
      </Number>
      <Issuer>Example Bank</Issuer>
      <Expiration>04/02</Expiration>
    </CreditCard>
  </PaymentInfo>

Encrypting the entire element:

<PaymentInfo xmlns='http://example.org/paymentv2'>
    <Name>John Smith</Name>
    <EncryptedData Type='http://www.w3.org/2001/04/xmlenc#Element'
     xmlns='http://www.w3.org/2001/04/xmlenc#'>
      <CipherData>
        <CipherValue>M12N34P56Q</CipherValue>
      </CipherData>
    </EncryptedData>
  </PaymentInfo>

XML Signature

MathML

Allows for both:


XHTML and MathML

mathml screenshot

mathml

OpenSearch

open search

What...

Leads to...

OpenSearch: Defining Search Services

Search definitions:
firefoxcreative commons search xml

OpenSearch: Defining Search Results

Search results:

RSS with "opensearch" elements: (from http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_response_elements)

<rss version="2.0" 
     xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
     xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Example.com Search: New York history</title>
    <link>http://example.com/New+York+history</link>
    <description>Search results for "New York history" at Example.com</description>
    <opensearch:totalResults>4230000</opensearch:totalResults>
    <opensearch:startIndex>21</opensearch:startIndex>
    <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
    <atom:link rel="search" type="application/opensearchdescription+xml" href="http://example.com/opensearchdescription.xml"/>
    <opensearch:Query role="request" searchTerms="New York History" startPage="1" />
    <item>
      <title>New York History</title>
      <link>http://www.columbia.edu/cu/lweb/eguids/amerihist/nyc.html</link>
      <description>
        ... Harlem.NYC - A virtual tour and information on 
        businesses ...  with historic photos of Columbia's own New York 
        neighborhood ... Internet Resources for the City's History. ...
      </description>
    </item>
  </channel>
</rss>

RSS with "opensearch" elements: (from http://www.opensearch.org/Specifications/OpenSearch/1.1#OpenSearch_response_elements)

<feed xmlns="http://www.w3.org/2005/Atom" 
       xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
   <title>Example.com Search: New York history</title> 
   <link href="http://example.com/New+York+history"/>
   <updated>2003-12-13T18:30:02Z</updated>
   <author> 
     <name>Example.com, Inc.</name>
   </author> 
   <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
   <opensearch:totalResults>4230000</opensearch:totalResults>
   <opensearch:startIndex>21</opensearch:startIndex>
   <opensearch:itemsPerPage>10</opensearch:itemsPerPage>
   <opensearch:Query role="request" searchTerms="New York History" startPage="1" />
   <link rel="alternate" href="http://example.com/New+York+History?pw=3" type="text/html"/>
   <link rel="self" href="http://example.com/New+York+History?pw=3&amp;format=atom" type="application/atom+xml"/>
   <link rel="first" href="http://example.com/New+York+History?pw=1&amp;format=atom" type="application/atom+xml"/>
   <link rel="previous" href="http://example.com/New+York+History?pw=2&amp;format=atom" type="application/atom+xml"/>
   <link rel="next" href="http://example.com/New+York+History?pw=4&amp;format=atom" type="application/atom+xml"/>
   <link rel="last" href="http://example.com/New+York+History?pw=42299&amp;format=atom" type="application/atom+xml"/>
   <link rel="search" type="application/opensearchdescription+xml" href="http://example.com/opensearchdescription.xml"/>
   <entry>
     <title>New York History</title>
     <link href="http://www.columbia.edu/cu/lweb/eguids/amerihist/nyc.html"/>
     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
     <updated>2003-12-13T18:30:02Z</updated>
     <content type="text">
       ... Harlem.NYC - A virtual tour and information on 
       businesses ...  with historic photos of Columbia's own New York 
       neighborhood ... Internet Resources for the City's History. ...
     </content>
   </entry>
   </feed>

OpenSearch Client (Federated Searching)

Harvard Worldwide

Search pulls from a variety of datasources:
opensearch client

XSLT Testing

schemas for XML

Don't forget about: W3C XML Schema, RNG, Schematron.

Schematron

The Schematron: An XML Structure Validation Language using Patterns in Trees

It is rules-based, using XSLT and XPath expressions.

Examples of Schematron in Action

Web Accessibility Initiative (WAI)

The Web Accessibility Initiative (WAI) develops strategies, guidelines, and resources to help make the Web accessible to people with disabilities.

Web Content Accessibility Guidelines 1.0 has 14 guidelines that are general principles of accessible design. Each guideline has one or more checkpoints that explain how the guideline applies in a specific area.

Testing WAI Checkpoints with Schematron

Using Schematron, we can test for many of the checkpoints within each guideline.

Guideline 1 deals with providing alternatives to auditory or visual content. Some Schematron rules for the these checkpoints could be:

<pattern name="Web Content Accessibility Guidelines 1.0, Guideline 1" 
	 see="http://www.w3.org/TR/WAI-WEBCONTENT/#gl-provide-equivalents" 
	 fpi="+//IDN sinica.edu.tw/SGML Schema for WAI::Guideline 1//EN" 
	 id="g1">
  <rule context="img | IMG">
    <assert test="@alt or @ALT or @longdesc or @LONGDESC">
      (1.1) An image element should have some descriptive text:  an alt or longdesc attribute.
    </assert>
    <key name="imgkey" path="@alt" />
  </rule>
  <rule context="input | INPUT">
    <assert test="@alt or @ALT">
      (1.1) An input element should have some descriptive text: an alt or longdesc attribute.
    </assert>
  </rule>
  <rule context="applet | APPLET">
    <assert test="@alt or @ALT">
      (1.1) An applet element should have some descriptive text: an alt or longdesc attribute.
    </assert>
  </rule>
  <rule context="map | MAP">
    <assert test="area/@alt or a or A or AREA/@ALT">
      (1.1) A map element should have some descriptive text: an alt attribute or a link.
    </assert>
    </rule>
    <rule context="object | OBJECT">
      <assert test="string-length(text()) &gt; 0">
	(1.1) An object element should contain some descriptive text.
      </assert>
    </rule>
    <rule context="frame | FRAME">
      <assert test="@longdesc or @LONGDESC">
	(1.1) A frame element should have some descriptive text: a longdesc attribute.
      </assert>
    </rule>
</pattern>

Guideline 2

  <pattern name="Web Content Accessibility Guidelines 1.0, Guideline 2" 
	   see="http://www.w3.org/TR/WAI-WEBCONTENT/#gl-color">
    <rule context="body | BODY">
      <report role="samecolor" 
	      test="string(@bgcolor) = string(@color)">
	(2.2) The background color and the foreground color are the same
      </report>
      <!-- put specific color comparisons here -->
    </rule>
  </pattern>

Guideline 3

  <pattern name="Web Content Accessibility Guidelines 1.0, Guideline 3" 
	   see="http://www.w3.org/TR/WAI-WEBCONTENT/#gl-structure-presentation">
    <rule context="b | I | i | B">
      <report test="self::*">
	(3.3) Concerning element 
	<name />: B and I are not recommended. Use strong and em, or stylesheets.
      </report>
    </rule>
    <rule context="ul | ol | UL | OL">
      <assert test="li or LI">
      (3.3) A list should not be used for formatting effects</assert>
    </rule>
  </pattern>

Guideline 4

  <pattern name="Web Content Accessibility Guidelines 1.0, Guideline 4" 
	   see="http://www.w3.org/TR/WAI-WEBCONTENT/#gl-abbreviated-and-foreign">
    <rule role="topdoc" context="html | HTML | body | BODY">
      <assert test="@lang or @xml:lang or @LANG">
	(4.3) The primary language of a document should be identified.
      </assert>
    </rule>
  </pattern>

Browse, Advanced Search, Faceted Search

Browse

e.g. DMOZ
browse

Advanced Search

e.g. HOLLIS Advanced Search
browse

Faceted Search

HOLLIS Catalog - Faceted Search/Browse
faceted

Faceted Search

facet

Faceted Search - Facetmap

Facetmap - a faceted classifcation software tool

FacetMap Wine Demo
facet wine

Facetmap

Courses
facetmap

My Stuff in iSites
facetmap

Congress
facetmap

Apache Solr - Search and Facets

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Apache Solr - Courses

facet

facet

SIMILE Project and SIMILE Widgets

SIMILE Project (Semantic Interoperability of Metadata and Information in unLike Environments) and SIMILE Widgets:

The SIMILE project from MIT has some excellent examples of JS applications

Exhibit: Sites Using Web Standards
websites

US Congress and Exhibit

Provide the data and the template, and Exhibit provides the functionality: different views, timeline according to birth date, filtering, sorting, and grouping.

Tiles View
Exhibit and Congress

Table View (columns are sortable)
Exhibit and Congress

Thumbnails View (Filters applied: California, Democrat)
Exhibit and Congress

Timeline View
Visible are those with birthdays in 1969-1972. Timeline is scrollable.
Exhibit and Congress

Congress data used with permission from GovTrack.US.

Exhibit (Part of SIMILE Widgets)

Exhibit is a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code.

Examples

similesimilesimilesimilesimilesimilesimilesimile

Validation of College and University Web Sites

Collected website information from over 250 colleges (list from US News & World Report).

For each school, collect the following information:

URLhttp://www.washington.edu/
TitleUniversity of Washington
Validvalid
Errors0
Doctype-//W3C//DTD XHTML 1.0 Strict//EN
encodingiso-8859-1
error string0 error
URL_endhttp://www.washington.edu/
thumshot_urlhttp://open.thumbshots.org/image.pxf?url=http://www.washington.edu/

This was accomplished through the wonders of Perl and the xml output of the W3C Markup Validator

Data into JSON

Convert the data to "JSON" format (JSON is Javascript Object Notation).

One entry looks like:

{
	"items" :      [
 		{
			"encoding" :     "iso-8859-1",
			"uri" :          "http://127.0.0.1/http%3A%2F%2Fwww.washington.edu%2F",
			"URL_end" :      "http://www.washington.edu/",
			"Title" :        "University of Washington",
			"URL" :          "http://www.washington.edu/",
			"type" :         "Item",
			"label" :        "http://www.washington.edu/",
			"thumbshot_url" : "http://open.thumbshots.org/image.pxf?url=http://www.washington.edu/",
			"error+string" : "0 error",
			"Valid" :        "Valid",
			"Errors" :       "0",
			"Doctype" :      "XHTML 1.0 Strict"
		},
    { ... more data .. }
  ] 
}    

The entire data file: colleges.json

Exhibit

College in Exhibit

Steps required:

  1. Import the Exhibit javascript
  2. Reference the data
  3. Create templates

Exhibit Templates

college exhibit

Template that defines a view:

<div>
  <div ex:role="exhibit-view" ex:viewClass="Exhibit.TileView"
       ex:possibleOrders=".Valid, .Title, .Errors, .Doctype">
  </div>
  <table ex:role="exhibit-lens" class="college">
    <tr>
      <td><img ex:src-content=".thumbshot_url" alt="thumbshot"
               height="90" width="120"/>
      </td>
      <td> <a ex:href-content=".URL" ex:content=".Title"></a>
        <ul>
          <li ex:content=".Doctype"></li>
          <li ex:content=".error+string"></li>
          <li ex:content=".encoding"></li>
        </ul>
      </td>
    </tr>
  </table>
</div>

Other Exhibit Components

college exhibit

<html>
  <head>
    <title>University and College Web Sites</title>
    <link href="college.json" type="application/json" rel="exhibit/data"/>
    <script src="http://static.simile.mit.edu/exhibit/api/exhibit-api.js"
      type="text/javascript"> </script>
    <style type="text/css">
      body { margin: 0.25in; }
    </style>
  </head>
  <body>
    <h1>University and College Web Sites</h1>
    <table width="100%">
      <tr valign="top">
        <td>
          <div id="exhibit-control-panel"></div>
          <div id="exhibit-view-panel">
            <div ex:role="exhibit-view" ex:viewClass="Exhibit.TileView"
              ex:possibleOrders=".Valid, .Title, .Errors, .Doctype">
            </div>
            <table ex:role="exhibit-lens" class="college">
              <tr>
                <td><img ex:src-content=".thumbshot_url" alt="thumbshot"
                    height="90" width="120"/>
                </td>
                <td> <a ex:href-content=".URL" ex:content=".Title"></a>
                  <ul>
                    <li ex:content=".Doctype"></li>
                    <li ex:content=".error+string"></li>
                    <li ex:content=".encoding"></li>
                  </ul>
                </td>
              </tr>
            </table>
          </div>
        </td>
        <td width="25%">
          <div id="exhibit-browse-panel"
            ex:facets=".Valid, .Doctype">
          </div>
        </td>
      </tr>
    </table>
  </body>
</html>

Add Another View

college exhibit

<div ex:role="exhibit-view" ex:viewClass="Exhibit.TabularView"
  ex:columns="     .thumbshot_url, .Title, .Valid, .Errors, .Doctype,      .URL"
  ex:columnLabels=" Thumbshot,     Title,  Valid,  Errors,  Document Type, URL"
  ex:columnFormats="image,         list,   list,   list,    list,          uri"
  ex:sortColumn="2" 
  ex:sortAscending="false">
</div>

Cocoon

XSLT within Java

XSLT outside Java

What to remember...

Why XML?

Extensible Markup Language (XML)

XML isn't always the best solution, but it is always worth considering.
XML in 10 points:

xml

Tim Bray (Sun) on XML:

"There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes," said Tim Bray of Sun Microsystems. "This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information-wrapping system; but as the first, it's done very well."
from press release for "W3C XML is Ten!"

Jonathan C. Roberts (ToDoFinder.com) on XML:

XML is essentially universal and is worth knowing for that very reason. It is analogous to the value of learning:

Thank you!

Copyright © 2002-2010 David P. Heitmeyer

Bookmark and Share