HtmlTokenizer Class Reference
[HTML parser]

List of all members.

Public Member Functions

 __construct (InputStream $stream)
 suppressWhitespaces ($isSuppressWhitespaces)
 
Returns:
HtmlTokenizer

 lowercaseAttributes ($isLowercaseAttributes)
 
Returns:
HtmlTokenizer

 lowercaseTags ($isLowercaseTags)
 
Returns:
HtmlTokenizer

 nextToken ()
 
Returns:
SgmlToken

 getErrors ()
 isInlineTag ($id)

Static Public Member Functions

static create (InputStream $stream)
 
Returns:
HtmlTokenizer

static isIdFirstChar ($char)
static isIdChar ($char)
static isValidId ($id)
static isSpacerChar ($char)
static removeWhitespaces (Cdata $cdata)

Public Attributes

const INITIAL_STATE = 1
const START_TAG_STATE = 2
const END_TAG_STATE = 3
const INSIDE_TAG_STATE = 4
const ATTR_NAME_STATE = 5
const WAITING_EQUAL_SIGN_STATE = 6
const ATTR_VALUE_STATE = 7
const CDATA_STATE = 8
const COMMENT_STATE = 9
const INLINE_TAG_STATE = 10
const EXTERNAL_TAG_STATE = 11
const DOCTYPE_TAG_STATE = 12
const FINAL_STATE = 42
const SPACER_MASK = '[ \r\n\t]'
const ID_FIRST_CHAR_MASK = '[A-Za-z]'
const ID_CHAR_MASK = '[-_:.A-Za-z0-9]'

Private Member Functions

 getNextChar ()
 getChars ($count)
 mark ()
 
Returns:
HtmlTokenizer

 reset ()
 
Returns:
HtmlTokenizer

 skip ($count)
 
Returns:
HtmlTokenizer

 lookAhead ($count)
 skipString ($string, $skipSpaces=false)
 makeTag ()
 
Returns:
HtmlTokenizer

 setupTag (SgmlTag $tag)
 
Returns:
SgmlTag

 handleState ()
 dumpBuffer ()
 
Returns:
HtmlTokenizer

 checkSpecialTagState ()
 outsideTagState ()
 createOpenTag ()
 
Returns:
SgmlOpenTag

 startTagState ()
 dumpEndTag ()
 
Returns:
HtmlTokenizer

 endTagState ()
 insideTagState ()
 dumpAttribute ()
 
Returns:
SgmlOpenTag

 attrNameState ()
 waitingEqualSignState ()
 attrValueState ()
 inlineTagState ()
 cdataState ()
 getComment ()
 commentState ()
 externalTagState ()
 doctypeTagState ()
 getContentToSubstring ($substring, $ignoreCase=false)
 using Knuth-Morris-Pratt algorithm.
 getTextualPosition ()
 warning ($message)
 
Returns:
HtmlTokenizer

 error ($message)
 
Returns:
HtmlTokenizer


Static Private Member Functions

static optionalLowercase ($string, $ignoreCase)

Private Attributes

 $inlineTags = array('style', 'script', 'textarea')
 $stream = null
 $char = null
 $line = 1
 $linePosition = 1
 $previousChar = null
 $mark = null
 $state = self::INITIAL_STATE
 $tags = array()
 $errors = array()
 $buffer = null
 $tagId = null
 $tag = null
 $completeTag = null
 $previousTag = null
 $attrName = null
 $attrValue = null
 $insideQuote = null
 $substringFound = false
 $suppressWhitespaces = false
 $lowercaseAttributes = false
 $lowercaseTags = false


Detailed Description

Definition at line 16 of file HtmlTokenizer.class.php.


Constructor & Destructor Documentation

HtmlTokenizer::__construct ( InputStream stream  ) 

Definition at line 74 of file HtmlTokenizer.class.php.

References getNextChar().

Here is the call graph for this function:


Member Function Documentation

static HtmlTokenizer::create ( InputStream stream  )  [static]

Returns:
HtmlTokenizer

Definition at line 84 of file HtmlTokenizer.class.php.

Referenced by OpenIdCredentials::__construct().

HtmlTokenizer::suppressWhitespaces ( isSuppressWhitespaces  ) 

Returns:
HtmlTokenizer

Definition at line 92 of file HtmlTokenizer.class.php.

Referenced by makeTag().

HtmlTokenizer::lowercaseAttributes ( isLowercaseAttributes  ) 

Returns:
HtmlTokenizer

Definition at line 104 of file HtmlTokenizer.class.php.

HtmlTokenizer::lowercaseTags ( isLowercaseTags  ) 

Returns:
HtmlTokenizer

Definition at line 116 of file HtmlTokenizer.class.php.

Referenced by createOpenTag(), and dumpEndTag().

HtmlTokenizer::nextToken (  ) 

Returns:
SgmlToken

Definition at line 128 of file HtmlTokenizer.class.php.

References handleState().

Here is the call graph for this function:

HtmlTokenizer::getErrors (  ) 

Definition at line 146 of file HtmlTokenizer.class.php.

static HtmlTokenizer::isIdFirstChar ( char  )  [static]

Definition at line 151 of file HtmlTokenizer.class.php.

References $char.

static HtmlTokenizer::isIdChar ( char  )  [static]

Definition at line 156 of file HtmlTokenizer.class.php.

References $char.

static HtmlTokenizer::isValidId ( id  )  [static]

Definition at line 161 of file HtmlTokenizer.class.php.

static HtmlTokenizer::isSpacerChar ( char  )  [static]

Definition at line 171 of file HtmlTokenizer.class.php.

References $char.

static HtmlTokenizer::removeWhitespaces ( Cdata cdata  )  [static]

Definition at line 176 of file HtmlTokenizer.class.php.

References Cdata::getData(), and Cdata::setData().

Here is the call graph for this function:

HtmlTokenizer::isInlineTag ( id  ) 

Definition at line 200 of file HtmlTokenizer.class.php.

Referenced by handleState().

static HtmlTokenizer::optionalLowercase ( string,
ignoreCase 
) [static, private]

Definition at line 205 of file HtmlTokenizer.class.php.

HtmlTokenizer::getNextChar (  )  [private]

Definition at line 213 of file HtmlTokenizer.class.php.

Referenced by __construct(), attrNameState(), attrValueState(), endTagState(), getChars(), getContentToSubstring(), inlineTagState(), insideTagState(), outsideTagState(), skip(), skipString(), startTagState(), and waitingEqualSignState().

HtmlTokenizer::getChars ( count  )  [private]

Definition at line 235 of file HtmlTokenizer.class.php.

References getNextChar().

Referenced by skipString().

Here is the call graph for this function:

HtmlTokenizer::mark (  )  [private]

Returns:
HtmlTokenizer

Definition at line 253 of file HtmlTokenizer.class.php.

Referenced by attrValueState(), externalTagState(), getComment(), reset(), and skipString().

HtmlTokenizer::reset (  )  [private]

Returns:
HtmlTokenizer

Definition at line 268 of file HtmlTokenizer.class.php.

References Assert::isNotNull(), and mark().

Referenced by attrValueState(), externalTagState(), getComment(), and skipString().

Here is the call graph for this function:

HtmlTokenizer::skip ( count  )  [private]

Returns:
HtmlTokenizer

Definition at line 285 of file HtmlTokenizer.class.php.

References getNextChar().

Here is the call graph for this function:

HtmlTokenizer::lookAhead ( count  )  [private]

Definition at line 293 of file HtmlTokenizer.class.php.

HtmlTokenizer::skipString ( string,
skipSpaces = false 
) [private]

Definition at line 304 of file HtmlTokenizer.class.php.

References getChars(), getNextChar(), mark(), and reset().

Referenced by checkSpecialTagState(), and inlineTagState().

Here is the call graph for this function:

HtmlTokenizer::makeTag (  )  [private]

Returns:
HtmlTokenizer

Definition at line 329 of file HtmlTokenizer.class.php.

References Assert::isNotNull(), Assert::isNull(), and suppressWhitespaces().

Referenced by attrNameState(), attrValueState(), cdataState(), commentState(), doctypeTagState(), dumpBuffer(), dumpEndTag(), externalTagState(), insideTagState(), startTagState(), and waitingEqualSignState().

Here is the call graph for this function:

HtmlTokenizer::setupTag ( SgmlTag tag  )  [private]

Returns:
SgmlTag

Definition at line 353 of file HtmlTokenizer.class.php.

References Assert::isNotNull(), Assert::isNull(), and SgmlTag::setId().

Referenced by createOpenTag(), and startTagState().

Here is the call graph for this function:

HtmlTokenizer::handleState (  )  [private]

Definition at line 365 of file HtmlTokenizer.class.php.

References attrNameState(), attrValueState(), cdataState(), commentState(), doctypeTagState(), endTagState(), externalTagState(), inlineTagState(), insideTagState(), isInlineTag(), outsideTagState(), startTagState(), and waitingEqualSignState().

Referenced by nextToken().

Here is the call graph for this function:

HtmlTokenizer::dumpBuffer (  )  [private]

Returns:
HtmlTokenizer

Definition at line 415 of file HtmlTokenizer.class.php.

References Cdata::create(), and makeTag().

Referenced by inlineTagState(), and outsideTagState().

Here is the call graph for this function:

HtmlTokenizer::checkSpecialTagState (  )  [private]

Definition at line 428 of file HtmlTokenizer.class.php.

References $state, $tag, and skipString().

Referenced by outsideTagState().

Here is the call graph for this function:

HtmlTokenizer::outsideTagState (  )  [private]

Definition at line 448 of file HtmlTokenizer.class.php.

References checkSpecialTagState(), dumpBuffer(), getNextChar(), Assert::isNull(), and warning().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::createOpenTag (  )  [private]

Returns:
SgmlOpenTag

Definition at line 524 of file HtmlTokenizer.class.php.

References SgmlOpenTag::create(), error(), lowercaseTags(), and setupTag().

Referenced by startTagState().

Here is the call graph for this function:

HtmlTokenizer::startTagState (  )  [private]

Definition at line 535 of file HtmlTokenizer.class.php.

References $char, SgmlIgnoredTag::create(), createOpenTag(), error(), getNextChar(), Assert::isNotNull(), Assert::isNull(), makeTag(), and setupTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::dumpEndTag (  )  [private]

Returns:
HtmlTokenizer

Definition at line 623 of file HtmlTokenizer.class.php.

References SgmlEndTag::create(), error(), lowercaseTags(), makeTag(), and warning().

Referenced by endTagState().

Here is the call graph for this function:

HtmlTokenizer::endTagState (  )  [private]

Definition at line 645 of file HtmlTokenizer.class.php.

References dumpEndTag(), error(), getNextChar(), Assert::isNull(), and Assert::isTrue().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::insideTagState (  )  [private]

Definition at line 704 of file HtmlTokenizer.class.php.

References $char, error(), getNextChar(), Assert::isNotNull(), Assert::isNull(), Assert::isTrue(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::dumpAttribute (  )  [private]

Returns:
SgmlOpenTag

Definition at line 779 of file HtmlTokenizer.class.php.

References error(), and warning().

Referenced by attrNameState(), attrValueState(), and waitingEqualSignState().

Here is the call graph for this function:

HtmlTokenizer::attrNameState (  )  [private]

Definition at line 801 of file HtmlTokenizer.class.php.

References $char, dumpAttribute(), error(), getNextChar(), Assert::isNotNull(), Assert::isNull(), Assert::isTrue(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::waitingEqualSignState (  )  [private]

Definition at line 880 of file HtmlTokenizer.class.php.

References dumpAttribute(), error(), getNextChar(), Assert::isNotNull(), Assert::isNull(), Assert::isTrue(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::attrValueState (  )  [private]

Definition at line 928 of file HtmlTokenizer.class.php.

References dumpAttribute(), error(), getNextChar(), Assert::isNotNull(), Assert::isNull(), Assert::isTrue(), makeTag(), mark(), reset(), and warning().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::inlineTagState (  )  [private]

TODO: some browsers expect cdata and parses it as well. TODO: browsers handles comments in more complex way, figure it out

Definition at line 1045 of file HtmlTokenizer.class.php.

References dumpBuffer(), error(), getContentToSubstring(), getNextChar(), Assert::isNull(), and skipString().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::cdataState (  )  [private]

Definition at line 1114 of file HtmlTokenizer.class.php.

References Cdata::create(), error(), getContentToSubstring(), Assert::isNull(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::getComment (  )  [private]

Definition at line 1138 of file HtmlTokenizer.class.php.

References error(), getContentToSubstring(), mark(), and reset().

Referenced by commentState().

Here is the call graph for this function:

HtmlTokenizer::commentState (  )  [private]

Definition at line 1165 of file HtmlTokenizer.class.php.

References SgmlIgnoredTag::comment(), Cdata::create(), getComment(), Assert::isNull(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::externalTagState (  )  [private]

Definition at line 1184 of file HtmlTokenizer.class.php.

References Cdata::create(), error(), getContentToSubstring(), Assert::isTrue(), makeTag(), mark(), and reset().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::doctypeTagState (  )  [private]

Definition at line 1217 of file HtmlTokenizer.class.php.

References Cdata::create(), error(), getContentToSubstring(), Assert::isTrue(), and makeTag().

Referenced by handleState().

Here is the call graph for this function:

HtmlTokenizer::getContentToSubstring ( substring,
ignoreCase = false 
) [private]

using Knuth-Morris-Pratt algorithm.

If $substring not found, returns whole remaining content

Definition at line 1240 of file HtmlTokenizer.class.php.

References $buffer, $char, and getNextChar().

Referenced by cdataState(), doctypeTagState(), externalTagState(), getComment(), and inlineTagState().

Here is the call graph for this function:

HtmlTokenizer::getTextualPosition (  )  [private]

Definition at line 1300 of file HtmlTokenizer.class.php.

HtmlTokenizer::warning ( message  )  [private]

Returns:
HtmlTokenizer

Definition at line 1314 of file HtmlTokenizer.class.php.

Referenced by attrValueState(), dumpAttribute(), dumpEndTag(), and outsideTagState().

HtmlTokenizer::error ( message  )  [private]

Returns:
HtmlTokenizer

Definition at line 1325 of file HtmlTokenizer.class.php.

Referenced by attrNameState(), attrValueState(), cdataState(), createOpenTag(), doctypeTagState(), dumpAttribute(), dumpEndTag(), endTagState(), externalTagState(), getComment(), inlineTagState(), insideTagState(), startTagState(), and waitingEqualSignState().


Member Data Documentation

const HtmlTokenizer::INITIAL_STATE = 1

Definition at line 18 of file HtmlTokenizer.class.php.

const HtmlTokenizer::START_TAG_STATE = 2

Definition at line 19 of file HtmlTokenizer.class.php.

const HtmlTokenizer::END_TAG_STATE = 3

Definition at line 20 of file HtmlTokenizer.class.php.

const HtmlTokenizer::INSIDE_TAG_STATE = 4

Definition at line 21 of file HtmlTokenizer.class.php.

const HtmlTokenizer::ATTR_NAME_STATE = 5

Definition at line 22 of file HtmlTokenizer.class.php.

const HtmlTokenizer::WAITING_EQUAL_SIGN_STATE = 6

Definition at line 23 of file HtmlTokenizer.class.php.

const HtmlTokenizer::ATTR_VALUE_STATE = 7

Definition at line 24 of file HtmlTokenizer.class.php.

const HtmlTokenizer::CDATA_STATE = 8

Definition at line 26 of file HtmlTokenizer.class.php.

const HtmlTokenizer::COMMENT_STATE = 9

Definition at line 27 of file HtmlTokenizer.class.php.

const HtmlTokenizer::INLINE_TAG_STATE = 10

Definition at line 28 of file HtmlTokenizer.class.php.

const HtmlTokenizer::EXTERNAL_TAG_STATE = 11

Definition at line 29 of file HtmlTokenizer.class.php.

const HtmlTokenizer::DOCTYPE_TAG_STATE = 12

Definition at line 30 of file HtmlTokenizer.class.php.

const HtmlTokenizer::FINAL_STATE = 42

Definition at line 32 of file HtmlTokenizer.class.php.

const HtmlTokenizer::SPACER_MASK = '[ \r\n\t]'

Definition at line 34 of file HtmlTokenizer.class.php.

const HtmlTokenizer::ID_FIRST_CHAR_MASK = '[A-Za-z]'

Definition at line 35 of file HtmlTokenizer.class.php.

const HtmlTokenizer::ID_CHAR_MASK = '[-_:.A-Za-z0-9]'

Definition at line 36 of file HtmlTokenizer.class.php.

HtmlTokenizer::$inlineTags = array('style', 'script', 'textarea') [private]

Definition at line 38 of file HtmlTokenizer.class.php.

HtmlTokenizer::$stream = null [private]

Definition at line 40 of file HtmlTokenizer.class.php.

HtmlTokenizer::$char = null [private]

Definition at line 42 of file HtmlTokenizer.class.php.

Referenced by attrNameState(), getContentToSubstring(), insideTagState(), isIdChar(), isIdFirstChar(), isSpacerChar(), and startTagState().

HtmlTokenizer::$line = 1 [private]

Definition at line 45 of file HtmlTokenizer.class.php.

HtmlTokenizer::$linePosition = 1 [private]

Definition at line 46 of file HtmlTokenizer.class.php.

HtmlTokenizer::$previousChar = null [private]

Definition at line 47 of file HtmlTokenizer.class.php.

HtmlTokenizer::$mark = null [private]

Definition at line 49 of file HtmlTokenizer.class.php.

HtmlTokenizer::$state = self::INITIAL_STATE [private]

Definition at line 51 of file HtmlTokenizer.class.php.

Referenced by checkSpecialTagState().

HtmlTokenizer::$tags = array() [private]

Definition at line 53 of file HtmlTokenizer.class.php.

HtmlTokenizer::$errors = array() [private]

Definition at line 54 of file HtmlTokenizer.class.php.

HtmlTokenizer::$buffer = null [private]

Definition at line 56 of file HtmlTokenizer.class.php.

Referenced by getContentToSubstring().

HtmlTokenizer::$tagId = null [private]

Definition at line 58 of file HtmlTokenizer.class.php.

HtmlTokenizer::$tag = null [private]

Definition at line 60 of file HtmlTokenizer.class.php.

Referenced by checkSpecialTagState().

HtmlTokenizer::$completeTag = null [private]

Definition at line 61 of file HtmlTokenizer.class.php.

HtmlTokenizer::$previousTag = null [private]

Definition at line 62 of file HtmlTokenizer.class.php.

HtmlTokenizer::$attrName = null [private]

Definition at line 64 of file HtmlTokenizer.class.php.

HtmlTokenizer::$attrValue = null [private]

Definition at line 65 of file HtmlTokenizer.class.php.

HtmlTokenizer::$insideQuote = null [private]

Definition at line 66 of file HtmlTokenizer.class.php.

HtmlTokenizer::$substringFound = false [private]

Definition at line 68 of file HtmlTokenizer.class.php.

HtmlTokenizer::$suppressWhitespaces = false [private]

Definition at line 70 of file HtmlTokenizer.class.php.

HtmlTokenizer::$lowercaseAttributes = false [private]

Definition at line 71 of file HtmlTokenizer.class.php.

HtmlTokenizer::$lowercaseTags = false [private]

Definition at line 72 of file HtmlTokenizer.class.php.


The documentation for this class was generated from the following file:
Generated on Sun Dec 9 21:57:44 2007 for onPHP by  doxygen 1.5.4