org.apache.commons.codec.language
Class Soundex
- Encoder, StringEncoder
Encodes a string into a Soundex value. Soundex is an encoding used to relate similar names, but can also be used as a
general purpose scheme to find word with similar phonemes.
$Id: Soundex.java,v 1.26 2004/07/07 23:15:24 ggregory Exp $- Apache Software Foundation
static Soundex | US_ENGLISH - An instance of Soundex using the US_ENGLISH_MAPPING mapping.
|
static char[] | US_ENGLISH_MAPPING - This is a default mapping of the 26 letters used in US English.
|
static String | US_ENGLISH_MAPPING_STRING - This is a default mapping of the 26 letters used in US English.
|
private int | maxLength - This feature is not needed since the encoding size must be constant.
|
private char[] | soundexMapping - Every letter of the alphabet is "mapped" to a numerical value.
|
Soundex() - Creates an instance using US_ENGLISH_MAPPING
|
Soundex(char[] mapping) - Creates a soundex instance using the given mapping.
|
int | difference(String s1, String s2) - Encodes the Strings and returns the number of characters in the two encoded Strings that are the same.
|
Object | encode(Object pObject) - Encodes an Object using the soundex algorithm.
|
String | encode(String pString) - Encodes a String using the soundex algorithm.
|
private char | getMappingCode(String str, int index) - Used internally by the SoundEx algorithm.
|
int | getMaxLength() - This feature is not needed since the encoding size must be constant.
|
private char[] | getSoundexMapping() - Returns the soundex mapping.
|
private char | map(char ch) - Maps the given upper-case character to it's Soudex code.
|
void | setMaxLength(int maxLength) - This feature is not needed since the encoding size must be constant.
|
private void | setSoundexMapping(char[] soundexMapping) - Sets the soundexMapping.
|
String | soundex(String str) - Retreives the Soundex code for a given String object.
|
US_ENGLISH
public static final Soundex US_ENGLISH
An instance of Soundex using the US_ENGLISH_MAPPING mapping.
US_ENGLISH_MAPPING
public static final char[] US_ENGLISH_MAPPING
This is a default mapping of the 26 letters used in US English. A value of 0
for a letter position
means do not encode.
US_ENGLISH_MAPPING_STRING
public static final String US_ENGLISH_MAPPING_STRING
This is a default mapping of the 26 letters used in US English. A value of
0
for a letter position
means do not encode.
(This constant is provided as both an implementation convenience and to allow Javadoc to pick
up the value for the constant values page.)
maxLength
private int maxLength
This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
The maximum length of a Soundex code - Soundex codes are only four characters by definition.
soundexMapping
private char[] soundexMapping
Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each
letter is mapped. This implementation contains a default map for US_ENGLISH
Soundex
public Soundex()
Creates an instance using US_ENGLISH_MAPPING
Soundex
public Soundex(char[] mapping)
Creates a soundex instance using the given mapping. This constructor can be used to provide an internationalized
mapping for a non-Western character set.
Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each
letter is mapped. This implementation contains a default map for US_ENGLISH
mapping
- Mapping array to use when finding the corresponding code for a given character
difference
public int difference(String s1,
String s2)
throws EncoderException
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same. This
return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or
identical values.
s1
- A String that will be encoded and compared.s2
- A String that will be encoded and compared.
- The number of characters in the two encoded Strings that are the same from 0 to 4.
encode
public Object encode(Object pObject)
throws EncoderException
Encodes an Object using the soundex algorithm. This method is provided in order to satisfy the requirements of
the Encoder interface, and will throw an EncoderException if the supplied object is not of type java.lang.String.
- encode in interface Encoder
pObject
- Object to encode
- An object (or type java.lang.String) containing the soundex code which corresponds to the String
supplied.
encode
public String encode(String pString)
Encodes a String using the soundex algorithm.
- encode in interface StringEncoder
pString
- A String object to encode
- A Soundex code corresponding to the String supplied
getMappingCode
private char getMappingCode(String str,
int index)
Used internally by the SoundEx algorithm.
Consonants from the same code group separated by W or H are treated as one.
str
- the cleaned working string to encode (in upper case).index
- the character position to encode
- Mapping code for a particular character
getMaxLength
public int getMaxLength()
This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
Returns the maxLength. Standard Soundex
getSoundexMapping
private char[] getSoundexMapping()
Returns the soundex mapping.
map
private char map(char ch)
Maps the given upper-case character to it's Soudex code.
ch
- An upper-case character.
setMaxLength
public void setMaxLength(int maxLength)
This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
Sets the maxLength.
maxLength
- The maxLength to set
setSoundexMapping
private void setSoundexMapping(char[] soundexMapping)
Sets the soundexMapping.
soundexMapping
- The soundexMapping to set.
soundex
public String soundex(String str)
Retreives the Soundex code for a given String object.
str
- String to encode using the Soundex algorithm
- A soundex code for the String supplied
commons-codec version 1.3 - Copyright © 2002-2004 - Apache Software Foundation