presage  0.9.2~beta
Classes | Public Member Functions | Protected Member Functions | Protected Attributes | Private Attributes | List of all members
Tokenizer Class Referenceabstract

#include <tokenizer.h>

Inheritance diagram for Tokenizer:
Inheritance graph
Collaboration diagram for Tokenizer:
Collaboration graph

Classes

class  StreamGuard
 

Public Member Functions

 Tokenizer (std::istream &stream, const std::string blankspaces, const std::string separators)
 
virtual ~Tokenizer ()
 
virtual int countTokens ()=0
 
virtual bool hasMoreTokens () const =0
 
virtual std::string nextToken ()=0
 
virtual double progress () const =0
 
void blankspaceChars (const std::string)
 
std::string blankspaceChars () const
 
void separatorChars (const std::string)
 
std::string separatorChars () const
 
void lowercaseMode (const bool)
 
bool lowercaseMode () const
 
std::string streamToString () const
 

Protected Member Functions

bool isBlankspace (const int character) const
 
bool isSeparator (const int character) const
 

Protected Attributes

std::istream & stream
 
std::ios::iostate sstate
 
std::streamoff offbeg
 
std::streamoff offend
 
std::streamoff offset
 

Private Attributes

std::string blankspaces
 
std::string separators
 
bool lowercase
 

Detailed Description

The Tokenizer class takes an input stream and parses it into "tokens", allowing the tokens to be read one at a time.

The parsing process is controlled by the character classification sets:

Each byte read from the input stream is regarded as a character in the range '\u0000' through '\u00FF'.

In addition, an instance has flags that control:

A typical application first constructs an instance of this class, supplying the input stream to be tokenized, the set of blankspaces, and the set of separators, and then repeatedly loops, while method hasMoreTokens returns true, calling the nextToken method.

Definition at line 64 of file tokenizer.h.

Constructor & Destructor Documentation

◆ Tokenizer()

Tokenizer::Tokenizer ( std::istream &  stream,
const std::string  blankspaces,
const std::string  separators 
)

Definition at line 27 of file tokenizer.cpp.

References blankspaceChars(), blankspaces, offbeg, offend, offset, separatorChars(), separators, sstate, and stream.

Here is the call graph for this function:

◆ ~Tokenizer()

Tokenizer::~Tokenizer ( )
virtual

Definition at line 53 of file tokenizer.cpp.

References sstate, and stream.

Member Function Documentation

◆ blankspaceChars() [1/2]

void Tokenizer::blankspaceChars ( const std::string  chars)

Sets blankspace characters.

Definition at line 61 of file tokenizer.cpp.

References blankspaces.

◆ blankspaceChars() [2/2]

std::string Tokenizer::blankspaceChars ( ) const

Gets blankspace characters.

Definition at line 66 of file tokenizer.cpp.

References blankspaces.

Referenced by Tokenizer().

Here is the caller graph for this function:

◆ countTokens()

virtual int Tokenizer::countTokens ( )
pure virtual

Returns the number of tokens left.

Implemented in ForwardTokenizer, and ReverseTokenizer.

◆ hasMoreTokens()

virtual bool Tokenizer::hasMoreTokens ( ) const
pure virtual

Tests if there are more tokens.

Implemented in ForwardTokenizer, and ReverseTokenizer.

◆ isBlankspace()

bool Tokenizer::isBlankspace ( const int  character) const
protected

Definition at line 91 of file tokenizer.cpp.

References blankspaces.

Referenced by ForwardTokenizer::nextToken(), and ReverseTokenizer::nextToken().

Here is the caller graph for this function:

◆ isSeparator()

bool Tokenizer::isSeparator ( const int  character) const
protected

Definition at line 101 of file tokenizer.cpp.

References separators.

Referenced by ForwardTokenizer::nextToken(), and ReverseTokenizer::nextToken().

Here is the caller graph for this function:

◆ lowercaseMode() [1/2]

void Tokenizer::lowercaseMode ( const bool  value)

Sets lowercase mode.

Definition at line 81 of file tokenizer.cpp.

References lowercase.

Referenced by ContextChangeDetector::change(), ContextTracker::learn(), and main().

Here is the caller graph for this function:

◆ lowercaseMode() [2/2]

bool Tokenizer::lowercaseMode ( ) const

Gets lowercase mode.

Definition at line 86 of file tokenizer.cpp.

References lowercase.

Referenced by ForwardTokenizer::nextToken(), and ReverseTokenizer::nextToken().

Here is the caller graph for this function:

◆ nextToken()

virtual std::string Tokenizer::nextToken ( )
pure virtual

Returns the next token.

Implemented in ForwardTokenizer, and ReverseTokenizer.

◆ progress()

virtual double Tokenizer::progress ( ) const
pure virtual

Returns progress percentage.

Implemented in ForwardTokenizer, and ReverseTokenizer.

◆ separatorChars() [1/2]

void Tokenizer::separatorChars ( const std::string  chars)

Sets separator characters.

Definition at line 71 of file tokenizer.cpp.

References separators.

◆ separatorChars() [2/2]

std::string Tokenizer::separatorChars ( ) const

Gets separator characters.

Definition at line 76 of file tokenizer.cpp.

References separators.

Referenced by Tokenizer().

Here is the caller graph for this function:

◆ streamToString()

std::string Tokenizer::streamToString ( ) const
inline

Definition at line 109 of file tokenizer.h.

References offbeg, offend, and stream.

Member Data Documentation

◆ blankspaces

std::string Tokenizer::blankspaces
private

Definition at line 154 of file tokenizer.h.

Referenced by blankspaceChars(), isBlankspace(), and Tokenizer().

◆ lowercase

bool Tokenizer::lowercase
private

Definition at line 157 of file tokenizer.h.

Referenced by lowercaseMode().

◆ offbeg

std::streamoff Tokenizer::offbeg
protected

◆ offend

std::streamoff Tokenizer::offend
protected

◆ offset

std::streamoff Tokenizer::offset
protected

◆ separators

std::string Tokenizer::separators
private

Definition at line 155 of file tokenizer.h.

Referenced by isSeparator(), separatorChars(), and Tokenizer().

◆ sstate

std::ios::iostate Tokenizer::sstate
protected

Definition at line 145 of file tokenizer.h.

Referenced by Tokenizer(), and ~Tokenizer().

◆ stream

std::istream& Tokenizer::stream
protected

The documentation for this class was generated from the following files: