public abstract class ContentParser extends StreamParser implements Parser
This parser is created by a derived class. The init(org.xml.sax.InputSource)
method
must be called before the parser is used or re-used (the parser is
re-entrant).
Parser
,
StreamParser
,
TokenParser
,
FastString
Modifier and Type | Field and Description |
---|---|
protected DocumentTypeImpl |
_docType
Holds the DTD document for this document or null.
|
protected DocumentHandler |
_documentHandler
Reference to document handler.
|
protected DTDHandler |
_dtdHandler
Reference to DTD handler.
|
protected EntityResolver |
_entityResolver
Reference to external entity resolver.
|
protected FastString |
_nodeText
Holds the contents of text node collected between elements and other
non-textual nodes.
|
_curChar, _tokenText, CR, EOF, LF, SPACE, TOKEN_CDATA, TOKEN_CLOSE_TAG, TOKEN_COMMENT, TOKEN_DTD, TOKEN_ENTITY_REF, TOKEN_EOF, TOKEN_OPEN_TAG, TOKEN_PE_REF, TOKEN_PI, TOKEN_SECTION, TOKEN_SECTION_END, TOKEN_TEXT, VALIDITY, WELL_FORMED
Modifier | Constructor and Description |
---|---|
protected |
ContentParser()
Protected constructor only accessible from derived class.
|
Modifier and Type | Method and Description |
---|---|
protected void |
enterElementState(String tagName,
int lineNumber,
boolean elementContent) |
protected org.openxml.parser.ElementState |
getElementState() |
protected org.openxml.parser.ElementState |
getPreviousState(org.openxml.parser.ElementState state) |
protected void |
init(InputSource input)
Initializes the parser to parse a new document.
|
protected abstract boolean |
isHtml() |
protected boolean |
isTokenSpace() |
protected org.openxml.parser.ElementState |
leaveElementState() |
void |
parse(String systemId) |
protected void |
parseAttrEntity(String name)
Parses an internal general entity into the attribute value.
|
protected boolean |
parseAttributes(String tagName)
Parses the attribute list of an XML/HTML open tag and calls the proper
events for this element using the attribute list.
|
protected void |
parseContentEntity()
Parses an internal or external general entity into XML/HTML contents.
|
protected boolean |
parseDocumentDecl(boolean XMLDecl)
Parses the document declaration for XML documents and external entities,
returning the standalone status and changing the character encoding (if
necessary).
|
protected EntityImpl |
parseGeneralEntity(EntityImpl entity)
Parses the general entity, returning the entity as parsed.
|
protected int |
readTokenContent()
Reads and returns a single content token.
|
protected int |
readTokenEntity()
Reads general entity reference token or character reference.
|
protected int |
readTokenEntity(boolean ignoreError) |
protected int |
readTokenMarkup()
Reads markup token.
|
protected int |
readTokenPERef()
Reads parameter entity reference token.
|
InputSource |
resolveEntity(String publicId,
String systemId) |
void |
setDocumentHandler(DocumentHandler handler) |
void |
setDTDHandler(DTDHandler handler) |
void |
setEntityResolver(EntityResolver resolver) |
protected String |
slicePITokenText()
Slices processing instruction text into target and instruction code.
|
canReadName, close, error, fatalError, format, format, format, getColumnNumber, getErrorHandler, getLastException, getLineNumber, getPublicId, getReader, getSystemId, isClosed, isNamePart, isNamePartFirst, isSpace, isTokenAllSpace, isWarning, message, pushBack, pushBack, pushBack, pushBackToken, readChar, readTokenName, readTokenNameCur, readTokenQuoted, setEncoding, setErrorHandler, setLocale, skipSpace, warning
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
parse, setErrorHandler, setLocale
protected DocumentTypeImpl _docType
protected FastString _nodeText
This is a state variable. Methods that use it directly or indirectly must be alert to its present state and usage and must reset it as needed. General value on entry/exit is specified in the method comment.
protected DocumentHandler _documentHandler
protected DTDHandler _dtdHandler
protected EntityResolver _entityResolver
protected ContentParser()
protected void init(InputSource input) throws SAXException, IOException
StreamParser.init(org.xml.sax.InputSource)
.input
- The input source to parseSAXException
- The input source cannot be usedIOException
public final void setDTDHandler(DTDHandler handler)
setDTDHandler
in interface Parser
public final void setDocumentHandler(DocumentHandler handler)
setDocumentHandler
in interface Parser
public final void setEntityResolver(EntityResolver resolver)
setEntityResolver
in interface Parser
public final void parse(String systemId) throws SAXException, IOException
parse
in interface Parser
SAXException
IOException
public final InputSource resolveEntity(String publicId, String systemId) throws IOException, SAXException
resolveEntity
in interface EntityResolver
resolveEntity
in class StreamParser
IOException
SAXException
protected final int readTokenContent() throws SAXException
StreamParser._tokenText
. No valid character is
held in StreamParser._curChar
on entry or on exit. Returned token is suitable
for document contents in either XML or HTML document or external entity.
StreamParser.TOKEN_EOF
indicates the end of the input stream.
Plain text is returned as a StreamParser.TOKEN_TEXT
token. Text that is part of
the element contents may be returned as multiple tokens, or text tokens
interwined with entity reference tokens (of type StreamParser.TOKEN_ENTITY_REF
).
For specific information about token processing, see readTokenMarkup()
and readTokenEntity()
.
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingStreamParser._tokenText
,
readTokenMarkup()
,
readTokenEntity()
protected final void parseContentEntity() throws SAXException
StreamParser._tokenText
contains the entity name on entry. Events are
fired to store the entity's value in the document contents.
If the entity is unparsed general, or not found, an error is issued.
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingprotected final EntityImpl parseGeneralEntity(EntityImpl entity) throws SAXException
EntityImpl
is passed to the method. On exit,
the same entity (parsed) is returned, or null to indicate that the entity
could not be parsed.
The following rules govern how the entity is parsed:
EntityImpl.STATE_PARSED
, then the
entity has been parsed before, and is returned.
EntityImpl.STATE_NOT_FOUND
, then
the entity could not be found, and null is returned. There is no need
to issue an error again.
EntityImpl.STATE_PARSING
, then the
entity is being parsed: this is a circular reference, an error is issued
and null is returned.
EntityImpl.STATE_DECLARED
, then the
entity is being parsed. For an external entity, the entity source is being
located using HolderFinder
. If the entity source could
not be found or could not be opened, the entity state is set to EntityImpl.STATE_NOT_FOUND
, an error is issued and null returned.
For an internal entity, the entity source is created from it's value.
EntityImpl.STATE_DECLARED
and the
entity source could be located, an XMLParser
is created and used
to parse the entity. If no fatal errors are encountered when parsing,
the entity is returned. Well formed errors are treated as if generated
by the current parser.
EntityImpl.STATE_DECLARED
and a fatal
error was issued while parsing the entity with an XMLParser
, then
a fatal error is issued and an exception raised.
entity
- The entity to parseSAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingprotected boolean parseAttributes(String tagName) throws SAXException
On entry the element's tag name and optional whitespaces have been read.
On exit, all attributes have been read including the closing mark.
No valid character is held in StreamParser._curChar
on entry or on exit.
tagName
- Name of element being parsedxml
- True if parsing an XML document, false for HTMLSAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingprotected final void parseAttrEntity(String name) throws SAXException
parseAttributes(java.lang.String)
when an entity reference is encountered in the
attribute value. StreamParser._tokenText
contains the entity name on entry
and the parsed value on exit.
If the entity is external, unparsed general or not found, an error is issued and nothing is placed in the attribute value.
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingprotected final boolean isTokenSpace()
protected final void enterElementState(String tagName, int lineNumber, boolean elementContent)
protected final org.openxml.parser.ElementState getElementState()
protected final org.openxml.parser.ElementState leaveElementState()
protected final org.openxml.parser.ElementState getPreviousState(org.openxml.parser.ElementState state)
protected abstract boolean isHtml()
protected final int readTokenMarkup() throws SAXException
StreamParser._tokenText
. The preceding '<' has been consumed prior to calling this
method. No valid character is held in StreamParser._curChar
on entry or exit.
The following rules govern how tokens are parsed and which code is returned:
StreamParser.TOKEN_OPEN_TAG
returned for opening tag. Opening tag is '<'
immediately followed by valid tag name (returned as token text) and
optional whitespace. Attributes and terminating '>' are not read by
this method. A whitespace between the '<' and tag name is reported as
an error; an empty tag name will never be returned.
StreamParser.TOKEN_CLOSE_TAG
returned for closing tag. Closing tag is '</'
followed by valid tag name (returned as token text) and '>'. All text
following the tag name until the terminating '>' is ignored; a whitespace
between the '<' and tag name is reported as an error; an empty tag name
will never be returned.
StreamParser.TOKEN_COMMENT
returned for comment. Comment is terminated with
'<!--' and '-->'. All text inbetween is consumed, and returned as
token text.
StreamParser.TOKEN_CDATA
returned for CDATA section. Section starts with
'<![CDATA[' and ends with ']]>'. All text inbetween is consumed and
returned as token text.
StreamParser.TOKEN_PI
returned for processing instruction. Processing
instruction is terminated with '<?' and '?>'. All text inbetween is
consumed, and returned as token text.
StreamParser.TOKEN_DTD
returned for DTD declaration. DTD declaration starts
with '<!' immediately followed by a token name (returned as token text).
All other declaration contents is not read by this method. A whitespace
between the '<!' and the token name is not allowed; an empty
declaration name is never returned.
StreamParser.TOKEN_SECTION
returned for DTD conditional section. Conditional
section begins with '<![' and is not a CDATA section. Only the '<!['
sequence is read and consumed by this method.
StreamParser.TOKEN_TEXT
is returned, with
'<' contained in StreamParser._tokenText
and the input stream is not affected.
An error is reported.StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encounteredprotected final int readTokenEntity() throws SAXException
StreamParser.TOKEN_ENTITY_REF
and the entity name in StreamParser._tokenText
. The preceding '&' has been consumed prior to calling
this method, and the trailing ';' is consumed by this method. No valid
character is held in StreamParser._curChar
on entry or exit.
If no valid entity name is found, the token code StreamParser.TOKEN_TEXT
is
returned, with '&' contained in StreamParser._tokenText
and the input
stream is not affected.
A '#' sign indicates a character reference (either decimal or hexadecimal)
which is read and stored in StreamParser._tokenText
, and the token code
StreamParser.TOKEN_TEXT
is returned. If the character reference value is
invalid, the token code StreamParser.TOKEN_TEXT
is returned, with the
invalid part contained in StreamParser._tokenText
and the input stream is
not affected.
If the entity reference or character reference is not terminated with a ';', a well-formed error is issued, and the name is returned as textual string instead of an entity reference.
StreamParser.TOKEN_ENTITY_REF
or StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encounteredprotected final int readTokenEntity(boolean ignoreError) throws SAXException
SAXException
protected final boolean parseDocumentDecl(boolean XMLDecl) throws SAXException
The document declaration is contained in a processing instruction that
appears at the very beginning of the document or entity and begins with
'xml' (case sensitive). The processing instruction's full text is expected
in the variable StreamParser._tokenText
on entry.
The declaration for XML documents contains a version number, optional character encoding and optional standalone status. The default standalone status is false. The declaration for external entities and external subsets contains an optional version number, and mandatory character encoding.
Currently only XML version "1.0" is supported. The current character
encoding is changed by calling StreamParser.setEncoding(java.lang.String)
.
XMLDecl
- True if expecting XML document declaration, false if expecting
external entity/subset declarationSAXException
- A parsing error has been encounteredprotected final String slicePITokenText() throws SAXException
StreamParser._tokenText
,
returning the valid target name, and StreamParser._tokenText
truncated to
contain just the instruction code. If no valid target name is found,
an empty name amd empty instruction are returned.
StreamParser._curChar
is undetermined on entry and exit and is used
inside function.SAXException
- A parsing error has been encounteredprotected final int readTokenPERef() throws SAXException, IOException
StreamParser.TOKEN_PE_REF
and the entity name in StreamParser._tokenText
. The preceding
'%' has been consumed prior to calling this method, and the trailing ';'
is consumed by this method. No valid character is held in StreamParser._curChar
on entry or exit.
If no valid entity name is found, the token code StreamParser.TOKEN_TEXT
is
returned, with '%' contained in StreamParser._tokenText
and the input stream
is not affected.
If the entity reference is not terminated with a ';', a well-formed error is issued, but the entity reference is still regarded valid.
StreamParser.TOKEN_PE_REF
or StreamParser.TOKEN_TEXT
SAXException
- A parsing error has been encountered, and based on
it severity, an exception is thrown to terminate parsingIOException
- An I/O exception has been encountered when reading
from the input streamPhantom® and NetPhantom® are registered trademarks of Mindus SARL.
© © Mindus SARL, 2024. All rights reserved.