public final class XMLSAXParser extends ContentParser implements Parser
_docType, _documentHandler, _dtdHandler, _entityResolver, _nodeText
_curChar, _tokenText, CR, EOF, LF, SPACE, TOKEN_CDATA, TOKEN_CLOSE_TAG, TOKEN_COMMENT, TOKEN_DTD, TOKEN_ENTITY_REF, TOKEN_EOF, TOKEN_OPEN_TAG, TOKEN_PE_REF, TOKEN_PI, TOKEN_SECTION, TOKEN_SECTION_END, TOKEN_TEXT, VALIDITY, WELL_FORMED
Constructor and Description |
---|
XMLSAXParser() |
Modifier and Type | Method and Description |
---|---|
protected void |
init(InputSource input)
Initializes the parser to parse a new document.
|
protected boolean |
isHtml() |
void |
parse(InputSource input) |
protected boolean |
parseDocumentDecl(boolean XMLDecl)
Parses the document declaration for XML documents and external entities,
returning the standalone status and changing the character encoding (if
necessary).
|
protected void |
parseDTDSubset(boolean standalone)
Parser the external DTD subset.
|
protected boolean |
parseNextNode(int token)
Parses the next node based on the supplied token.
|
protected int |
readTokenEntity()
Reads general entity reference token or character reference.
|
protected int |
readTokenEntity(boolean ignoreError) |
protected int |
readTokenMarkup()
Reads markup token.
|
protected int |
readTokenPERef()
Reads parameter entity reference token.
|
protected String |
slicePITokenText()
Slices processing instruction text into target and instruction code.
|
enterElementState, getElementState, getPreviousState, isTokenSpace, leaveElementState, parse, parseAttrEntity, parseAttributes, parseContentEntity, parseGeneralEntity, readTokenContent, resolveEntity, setDocumentHandler, setDTDHandler, setEntityResolver
canReadName, close, error, fatalError, format, format, format, getColumnNumber, getErrorHandler, getLastException, getLineNumber, getPublicId, getReader, getSystemId, isClosed, isNamePart, isNamePartFirst, isSpace, isTokenAllSpace, isWarning, message, pushBack, pushBack, pushBack, pushBackToken, readChar, readTokenName, readTokenNameCur, readTokenQuoted, setEncoding, setErrorHandler, setLocale, skipSpace, warning
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
parse, setDocumentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setLocale
protected void init(InputSource input) throws SAXException, IOException
StreamParser.init(org.xml.sax.InputSource)
.init
in class ContentParser
input
- The input source to parseSAXException
- The input source cannot be usedIOException
- An error occurs accessing the input streampublic void parse(InputSource input) throws SAXException, IOException
parse
in interface Parser
SAXException
IOException
protected boolean isHtml()
isHtml
in class ContentParser
protected final void parseDTDSubset(boolean standalone) throws SAXException
SAXException
protected boolean parseNextNode(int token) throws SAXException
The return value indicates if the root element has been closed with a closing tag or reached end of file (false), or should parsing continue (true).
token
- The last token read with ContentParser.readTokenContent()
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingContentParser.parseAttributes(java.lang.String)
,
ContentParser.readTokenContent()
protected final int readTokenMarkup() throws SAXException
StreamParser._tokenText
. The preceding '<' has been consumed prior to calling this
method. No valid character is held in StreamParser._curChar
on entry or exit.
The following rules govern how tokens are parsed and which code is returned:
StreamParser.TOKEN_OPEN_TAG
returned for opening tag. Opening tag is '<'
immediately followed by valid tag name (returned as token text) and
optional whitespace. Attributes and terminating '>' are not read by
this method. A whitespace between the '<' and tag name is reported as
an error; an empty tag name will never be returned.
StreamParser.TOKEN_CLOSE_TAG
returned for closing tag. Closing tag is '</'
followed by valid tag name (returned as token text) and '>'. All text
following the tag name until the terminating '>' is ignored; a whitespace
between the '<' and tag name is reported as an error; an empty tag name
will never be returned.
StreamParser.TOKEN_COMMENT
returned for comment. Comment is terminated with
'<!--' and '-->'. All text inbetween is consumed, and returned as
token text.
StreamParser.TOKEN_CDATA
returned for CDATA section. Section starts with
'<![CDATA[' and ends with ']]>'. All text inbetween is consumed and
returned as token text.
StreamParser.TOKEN_PI
returned for processing instruction. Processing
instruction is terminated with '<?' and '?>'. All text inbetween is
consumed, and returned as token text.
StreamParser.TOKEN_DTD
returned for DTD declaration. DTD declaration starts
with '<!' immediately followed by a token name (returned as token text).
All other declaration contents is not read by this method. A whitespace
between the '<!' and the token name is not allowed; an empty
declaration name is never returned.
StreamParser.TOKEN_SECTION
returned for DTD conditional section. Conditional
section begins with '<![' and is not a CDATA section. Only the '<!['
sequence is read and consumed by this method.
StreamParser.TOKEN_TEXT
is returned, with
'<' contained in StreamParser._tokenText
and the input stream is not affected.
An error is reported.StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encounteredprotected final int readTokenEntity() throws SAXException
StreamParser.TOKEN_ENTITY_REF
and the entity name in StreamParser._tokenText
. The preceding '&' has been consumed prior to calling
this method, and the trailing ';' is consumed by this method. No valid
character is held in StreamParser._curChar
on entry or exit.
If no valid entity name is found, the token code StreamParser.TOKEN_TEXT
is
returned, with '&' contained in StreamParser._tokenText
and the input
stream is not affected.
A '#' sign indicates a character reference (either decimal or hexadecimal)
which is read and stored in StreamParser._tokenText
, and the token code
StreamParser.TOKEN_TEXT
is returned. If the character reference value is
invalid, the token code StreamParser.TOKEN_TEXT
is returned, with the
invalid part contained in StreamParser._tokenText
and the input stream is
not affected.
If the entity reference or character reference is not terminated with a ';', a well-formed error is issued, and the name is returned as textual string instead of an entity reference.
StreamParser.TOKEN_ENTITY_REF
or StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encounteredprotected final int readTokenEntity(boolean ignoreError) throws SAXException
SAXException
protected final boolean parseDocumentDecl(boolean XMLDecl) throws SAXException
The document declaration is contained in a processing instruction that
appears at the very beginning of the document or entity and begins with
'xml' (case sensitive). The processing instruction's full text is expected
in the variable StreamParser._tokenText
on entry.
The declaration for XML documents contains a version number, optional character encoding and optional standalone status. The default standalone status is false. The declaration for external entities and external subsets contains an optional version number, and mandatory character encoding.
Currently only XML version "1.0" is supported. The current character
encoding is changed by calling StreamParser.setEncoding(java.lang.String)
.
XMLDecl
- True if expecting XML document declaration, false if expecting
external entity/subset declarationSAXException
- A parsing error has been encounteredprotected final String slicePITokenText() throws SAXException
StreamParser._tokenText
,
returning the valid target name, and StreamParser._tokenText
truncated to
contain just the instruction code. If no valid target name is found,
an empty name amd empty instruction are returned.
StreamParser._curChar
is undetermined on entry and exit and is used
inside function.SAXException
- A parsing error has been encounteredprotected final int readTokenPERef() throws SAXException, IOException
StreamParser.TOKEN_PE_REF
and the entity name in StreamParser._tokenText
. The preceding
'%' has been consumed prior to calling this method, and the trailing ';'
is consumed by this method. No valid character is held in StreamParser._curChar
on entry or exit.
If no valid entity name is found, the token code StreamParser.TOKEN_TEXT
is
returned, with '%' contained in StreamParser._tokenText
and the input stream
is not affected.
If the entity reference is not terminated with a ';', a well-formed error is issued, but the entity reference is still regarded valid.
StreamParser.TOKEN_PE_REF
or StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingIOException
- An I/O exception has been encountered when reading
from the input streamPhantom® and NetPhantom® are registered trademarks of Mindus SARL.
© © Mindus SARL, 2024. All rights reserved.