org.apache.lucene.wikipedia.analysis
Class WikipediaTokenizer
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.wikipedia.analysis.WikipediaTokenizer
public class WikipediaTokenizer
- extends Tokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. It is based off of the
Wikipedia tutorial available at http://en.wikipedia.org/wiki/Wikipedia:Tutorial, but it may not be complete.
EXPERIMENTAL !!!!!!!!!
NOTE: This Tokenizer is considered experimental and the grammar is subject to change in the trunk and in follow up releases.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INTERNAL_LINK
public static final String INTERNAL_LINK
- See Also:
- Constant Field Values
EXTERNAL_LINK
public static final String EXTERNAL_LINK
- See Also:
- Constant Field Values
EXTERNAL_LINK_URL
public static final String EXTERNAL_LINK_URL
- See Also:
- Constant Field Values
CITATION
public static final String CITATION
- See Also:
- Constant Field Values
CATEGORY
public static final String CATEGORY
- See Also:
- Constant Field Values
BOLD
public static final String BOLD
- See Also:
- Constant Field Values
ITALICS
public static final String ITALICS
- See Also:
- Constant Field Values
BOLD_ITALICS
public static final String BOLD_ITALICS
- See Also:
- Constant Field Values
HEADING
public static final String HEADING
- See Also:
- Constant Field Values
SUB_HEADING
public static final String SUB_HEADING
- See Also:
- Constant Field Values
WikipediaTokenizer
public WikipediaTokenizer(Reader input)
- Creates a new instance of the
WikipediaTokenizer
. Attaches the
input
to a newly created JFlex scanner.
- Parameters:
input
- The Input Reader
next
public Token next(Token result)
throws IOException
- Overrides:
next
in class TokenStream
- Throws:
IOException
reset
public void reset()
throws IOException
- Overrides:
reset
in class TokenStream
- Throws:
IOException
reset
public void reset(Reader reader)
throws IOException
- Overrides:
reset
in class Tokenizer
- Throws:
IOException
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.