public abstract class StreamParser extends Object implements Locator, EntityResolver
This parser is created by a derived class. The init(org.xml.sax.InputSource)
method
must be called before the parser is used or re-used (the parser is
re-entrant).
The input stream is consumed through the readChar()
and pushBack()
methods that work on the value of the state variable _curChar
. This single variable is constantly reused by all the higher
level methods to denote the last character read, so be careful in how
you use it.
Tokens are read by a variaty of low level methods and stored in the
state variable _tokenText
. Once again, this signle variable
is constantly reused by all the higher level methods to contain the
last read token, so be careful in how you use it. This variable is of
type FastString
, an optimized version of StringBuffer that can
be reused perpetually by calling FastString.setLength(int)
with zero.
Generally methods should document the expected states of _curChar
and _tokenText
on entry and on exit and whether they change them.
To make the parser efficient, some methods will leave either variable with
an undertermined value, others will levae these variables with the next
value to be processed. It is important to pay attention to details.
The #Locator
interface is implemented to return indication of
the parser's location in the input stream (identifier, line number, etc).
All the error reporting methods are implemented at this layer. Errors
are generated from a resource file (see Resources
where the
actual error message is held. The error message is selected by some
generic name, either by calling (message(java.lang.String)
) or by formatting
argument values directly into the message (format(java.lang.String, java.lang.Object)
).
Errors are generally reported at one of four levels:
warning(java.lang.String)
error(int, java.lang.String)
with the
level VALIDITY
error(int, java.lang.String)
with the level WELL_FORMED
fatalError(java.lang.Exception)
ErrorReport
, the extended error features will be used. If no
error handler is used, warnings are ignored, errors and fatal errors
will throw an exception and stop the parser.Parser
,
ErrorReport
,
Resources
,
FastString
Modifier and Type | Field and Description |
---|---|
protected int |
_curChar
Holds the last character read by
readChar() , or the next character
to be pushed back (see pushBack() ). |
protected FastString |
_tokenText
Holds the contents of the last token read by one of the token reading
methods.
|
protected static char |
CR
Carriage return.
|
protected static int |
EOF
Indicates that end of file (or the input stream) has been reached and no
more character are availble.
|
protected static char |
LF
Line feed.
|
protected static char |
SPACE
Space.
|
protected static short |
TOKEN_CDATA
CDATA section token.
|
protected static short |
TOKEN_CLOSE_TAG
Close tag token.
|
protected static short |
TOKEN_COMMENT
Comment token.
|
protected static short |
TOKEN_DTD
DTD token.
|
protected static short |
TOKEN_ENTITY_REF
Entity reference token.
|
protected static short |
TOKEN_EOF
End of input.
|
protected static short |
TOKEN_OPEN_TAG
Open tag token.
|
protected static short |
TOKEN_PE_REF
Parameter entity reference.
|
protected static short |
TOKEN_PI
Processing instruction token.
|
protected static short |
TOKEN_SECTION
DTD section token.
|
protected static short |
TOKEN_SECTION_END
DTD section token end.
|
protected static short |
TOKEN_TEXT
Textual token.
|
protected static int |
VALIDITY
Error level used for reporting validity errors.
|
protected static int |
WELL_FORMED
Error level used for reporting well formed errors.
|
Modifier | Constructor and Description |
---|---|
protected |
StreamParser()
Protected constructor only accessible from derived class.
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
canReadName(String name)
Returns true if the specified name can be read and consumes it all.
|
protected void |
close()
Closes the input stream.
|
protected void |
error(int errorLevel,
String message)
Reports a parser error.
|
protected void |
fatalError(Exception except)
Reports a fatal error.
|
protected String |
format(String message,
Object arg1)
Format a message from the resource file based on the selected
locale and given argument.
|
protected String |
format(String message,
Object arg1,
Object arg2)
Format a message from the resource file based on the selected
locale and given arguments.
|
protected String |
format(String message,
Object arg1,
Object arg2,
Object arg3)
Format a message from the resource file based on the selected
locale and given arguments.
|
int |
getColumnNumber() |
ErrorHandler |
getErrorHandler()
Returns the error handler associated with this parser (if any).
|
SAXException |
getLastException()
Returns the last error issued by the parser.
|
int |
getLineNumber() |
String |
getPublicId() |
protected Reader |
getReader()
Returns the reader used for accessing the underlying input stream.
|
String |
getSystemId() |
protected void |
init(InputSource input)
Initializes the parser to parse a new document.
|
protected boolean |
isClosed()
Returns true if the document has been fully parsed and the parsed has
been closed.
|
protected boolean |
isNamePart(int ch)
Returns true if character is part of a valid name.
|
protected boolean |
isNamePartFirst(int ch) |
protected boolean |
isSpace(int ch)
Returns true if character is a whitespace.
|
protected boolean |
isTokenAllSpace()
Returns true if the token is all whitespace.
|
protected boolean |
isWarning()
Returns true if warning should be issued.
|
protected String |
message(String message)
Return a message from the resource file based on the selected
locale.
|
protected void |
pushBack()
Push back the last character read into
_curChar . |
protected void |
pushBack(int ch)
Push back a single character.
|
void |
pushBack(String pushStr) |
protected void |
pushBackToken()
Pushes back the token contained in
_tokenText . |
protected int |
readChar()
Reads and returns a single character from the input stream.
|
protected boolean |
readTokenName()
Reads a valid token name and places it in
_tokenText . |
protected boolean |
readTokenNameCur()
Reads a valid token name and places it in
_tokenText . |
protected boolean |
readTokenQuoted()
Reads the quoted identifier token.
|
InputSource |
resolveEntity(String publicId,
String systemId) |
protected void |
setEncoding(String encoding)
Changes the encoding of the input stream.
|
void |
setErrorHandler(ErrorHandler handler)
Associates this parser with an error handler.
|
void |
setLocale(Locale locale)
Set the locale for error messages.
|
protected void |
skipSpace() |
protected void |
warning(String message)
Reports a warning.
|
protected static final int EOF
protected static final char LF
protected static final char CR
protected static final char SPACE
protected static final short TOKEN_EOF
protected static final short TOKEN_TEXT
_tokenText
contains the plain text. This token
is generally used to construct a Text
node when
appearing in the content.protected static final short TOKEN_ENTITY_REF
_tokenText
contains the entity name.
This token is generally used to construct a EntityReference
node.protected static final short TOKEN_OPEN_TAG
_tokenText
contains the tag name. This token
is generally used to construct a Element
node.
Only the tag name is read, the attributes and terminating '>' should
be read separately (see ContentParser.parseAttributes(java.lang.String)
).protected static final short TOKEN_CLOSE_TAG
_tokenText
contains the tag name. This token
is generally used to construct a Element
node.
The entire closing tag has been consumed.protected static final short TOKEN_COMMENT
_tokenText
contains the comment text (if in mode
#MODE_STORE_COMMENT
). This token is generally used to construct
a Comment
node.protected static final short TOKEN_PI
_tokenText
contains the processing
instruction (if in mode #MODE_STORE_PI
). This token is generally
used to construct a ProcessingInstruction
node.protected static final short TOKEN_CDATA
_tokenText
contains the CDATA contents.
This token is generally used to construct a CDATASection
node.protected static final short TOKEN_DTD
_tokenText
contains the DTD entity type (whatever comes
after '
protected static final short TOKEN_SECTION
TOKEN_CDATA
would have been
returned. This token is valid only in the external DTD subset.protected static final short TOKEN_SECTION_END
TOKEN_CDATA
token. This token is valid only in the external DTD.protected static final short TOKEN_PE_REF
_tokenText
contains the entity name.
This token is valid only in the DTD.protected static final int WELL_FORMED
protected static final int VALIDITY
protected int _curChar
readChar()
, or the next character
to be pushed back (see pushBack()
). Set to EOF
if end
of input stream has been reached.
This is a state variable. Methods that use it directly or indirectly must be alert to its present value and usage. General value on entry/exit is specified in the method comment.
protected FastString _tokenText
TOKEN_EOF
if end of input stream has been reached.
Many methods read, modify and possibly return values in _tokenText
,
so its value should not be assumed to remain constant between method calls.
A FastString
is allocated and constantly reused by resetting
its length to zero. In some instances, it is replaced with an alternative
FastString
object. IT IS IMPORTANT that no other variable will
reference this string.
This is a state variable. Methods that use it directly or indirectly must be alert to its present state and usage and must reset it as needed. General value on entry/exit is specified in the method comment.
protected StreamParser()
protected void init(InputSource input) throws SAXException, IOException
The input source is expected to provide the public and system identifiers of the document for the purpose of error reporting and relative external entity resolving. No checking is done on their validity.
The input source must specify either an input stream or a reader. If neither is specified, an exception is thrown. If a reader is specified, it is used for consuming the source document. If an input stream is specified, it is preferred to the reader and the specified encoding is used to convert it. If the specified encoding is not supported, an exception is thrown. If no encoding is specified, the default UTF-8 is used.
Note, the input source is expected to be available to the parser for the entire duration of parsing the document.
input
- The input source to parseSAXException
- The input source cannot be usedIOException
protected final boolean readTokenQuoted() throws SAXException
_tokenText
.
Returns true if a quoted value was found (i.e. opening quote
followed on the input stream).SAXException
- A low-level error has been encounteredprotected final boolean readTokenName() throws SAXException
_tokenText
.
If a valid name can be read, it is placed in _tokenText
and
true is returned, otherwise false is returned and the input stream
is not affected.
A valid token name is defined as consisting of any letter, underscore or colon, followed by zero or more letters and digits, underscores, hyphens, colons and periods. Unlike other languages, letters and digits can be specified in all Unicode supported languages.
_curChar
does not contain a valid value on either entry or
exit from this method.
SAXException
- A low-level error has been encounteredprotected final boolean readTokenNameCur() throws SAXException
_tokenText
.
If a valid name can be read, it is placed in _tokenText
and
true is returned, otherwise false is returned and the input stream
is not affected.
A valid token name is defined as consisting of any letter, underscore or colon, followed by zero or more letters and digits, underscores, hyphens, colons and periods. Unlike other languages, letters and digits can be specified in all Unicode supported languages.
{@link #_curChar) contains the last character read (first token character) on entry, but does not contain a valid value on exit.
SAXException
- A low-level error has been encounteredprotected final boolean canReadName(String name) throws SAXException
SAXException
- A low-level error has been encounteredprotected final boolean isTokenAllSpace()
_tokenText
and all its characters must be whitespace as defined
by isSpace(int)
._tokenText
is all whitespaceprotected final boolean isNamePart(int ch)
Valid names are defined as consisting of any letter, underscore or colon, followed by zero or more letters and digits, underscores, hyphens, colons and periods. Unlike other languages, letters and digits can be specified in all Unicode supported languages.
Test performed on _curChar
.
first
- True if first letter in the nameprotected final boolean isNamePartFirst(int ch)
protected final boolean isSpace(int ch)
_curChar
.ch
- The character to checkprotected final void skipSpace() throws SAXException
SAXException
protected final void pushBackToken()
_tokenText
. The value
of _tokenText
is identical on entry and exit. The value
of _curChar
is unaffected.protected final int readChar() throws SAXException
EOF
is returned. The returned character is also available
in the _curChar
variable.
Line breaks (LF, CR and CR+LF) are returned as a single line feed (0x0A) character.
_curChar
SAXException
- An I/O exception has been encountered when reading
from the input streamprotected final void pushBack()
_curChar
. The pushed
back character will be returned when readChar()
is called next.
Any number of characters can be pushed back. The push back buffer is
a LIFO stack, so text should be pushed back in reverse order. It is
not an error to push back the value EOF
.protected final void pushBack(int ch)
readChar()
is called next. Any number of characters can be
pushed back. The push back buffer is a LIFO stack, so text should be
pushed back in reverse order. It is not an error to push back the value
EOF
.ch
- The character to push backpublic void pushBack(String pushStr)
protected final void setEncoding(String encoding) throws SAXException
SAXException
protected final Reader getReader()
protected final void close()
protected final boolean isClosed()
close()
method.public final int getLineNumber()
getLineNumber
in interface Locator
public final int getColumnNumber()
getColumnNumber
in interface Locator
public final String getSystemId()
getSystemId
in interface Locator
public final String getPublicId()
getPublicId
in interface Locator
public InputSource resolveEntity(String publicId, String systemId) throws IOException, SAXException
resolveEntity
in interface EntityResolver
IOException
SAXException
public final SAXException getLastException()
public final void setErrorHandler(ErrorHandler handler)
ErrorReport
for reporting errors.
Some applications may wish to provide their own error handler,
by calling this method. If the handler is set to null, all errors
encountered in the code will throw an exception and stop the parser.handler
- The new error handler to use, or nullErrorHandler
,
ErrorReport
public final ErrorHandler getErrorHandler()
ErrorReport
.ErrorHandler
public final void setLocale(Locale locale)
locale
- The locale to use, null for the system defaultprotected final boolean isWarning()
protected final void warning(String message) throws SAXException
message
- The warning messageSAXException
- The error handler might respond by throwing
an exception that will stop the parserprotected final void error(int errorLevel, String message) throws SAXException
errorLevel
- The error levelmessage
- The error messageSAXException
- The error handler might respond by throwing
an exception that will stop the parserprotected final void fatalError(Exception except) throws SAXException
except
- The fatal exceptionSAXException
- The error handler will respond by throwing
an exception that will stop the parserprotected final String message(String message)
Resources.message(java.lang.String)
with a locale known to the parser.message
- The message identifierprotected final String format(String message, Object arg1)
Resources.format(java.lang.String, java.lang.Object)
with a locale known to the
parser.message
- The message identifierarg1
- The first argumentprotected final String format(String message, Object arg1, Object arg2)
Resources.format(java.lang.String, java.lang.Object)
with a locale known to the
parser.message
- The message identifierarg1
- The first argumentarg2
- The second argumentprotected final String format(String message, Object arg1, Object arg2, Object arg3)
Resources.format(java.lang.String, java.lang.Object)
with a locale known to the
parser.message
- The message identifierarg1
- The first argumentarg2
- The second argumentarg3
- The third argumentPhantom® and NetPhantom® are registered trademarks of Mindus SARL.
© © Mindus SARL, 2024. All rights reserved.