freeutils.net
Presents:
JCharset - Java Charset package
Download
Download the latest release: (What's new?)
JCharset 1.3 (includes source code) (65K)
What is the Java Charset package?
The Java Charset package is an open-source implementation of character
sets that were missing from the standard Java platform.
How do I use the Java Charset package?
The Java Charset package is written in pure Java, and thus requires no special
installation. Just add the "jcharset.jar" file to your classpath, or place it
in any of the usual extension directories.
The JVM will recognize the supported character sets automatically, and they
will be available anywhere character sets are used in the Java platform.
As an example, you can take a look at java.lang.String's constructor and
getBytes() method, both of which have an overloaded version that receives
a charset name as an argument.
Note: Some web/mail containers run each application in it's own JVM context.
In this case check the container documentation for information on where/how
to configure the classpath, such as in WEB-INF/lib, shared/lib, jre/lib/ext,
etc. You may need to restart the server for changes to take effect.
However, if you use Sun's JRE, it will work only if you put it in the jre/lib/ext
extension directory, or in the container's classpath. This is due to a bug in
Sun's JRE implementation (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4619777).
Voting for the bug will hasten their fixing it, so please do...
Which charsets are supported?
-
"UTF-7" (a.k.a. "UTF7", "UNICODE-1-1-UTF-7", "csUnicode11UTF7", "UNICODE-2-0-UTF-7")
The 7-bit Unicode character encoding defined in RFC 2152.
The O-set characters are encoded as a shift sequence.
Both O-set flavors (direct and shifted) are decoded.
-
"UTF-7-OPTIONAL" (a.k.a. "UTF-7O", "UTF7O", "UTF-7-O")
The 7-bit Unicode character encoding defined in RFC 2152.
The O-set characters are directly encoded.
Both O-set flavors (direct and shifted) are decoded.
-
"SCGSM" (a.k.a. "GSM-default-alphabet", "GSM_0338", "GSM_DEFAULT", "GSM7", "GSM-7BIT")
The GSM default charset as specified in GSM 03.38, used in SMPP for
encoding SMS text messages.
Additional flavors of the GSM charset are "CCGSM", "SCPGSM" and "CCPGSM":
The CC prefix signifies mapping the Latin capital letter C with cedilla character,
the SC prefix signifies mapping the Latin small letter c with cedilla character,
and the P prefix signifies the packed form (8 characters packed in 7 bytes),
as specified by the spec. See javadocs for details.
-
"hp-roman8" (a.k.a. "roman8", "r8", "csHPRoman8", "X-roman8")
The HP Roman-8 charset, as provided in RFC 1345.
-
"ISO-8859-8-BIDI" (a.k.a. "csISO88598I", "ISO-8859-8-I", "ISO_8859-8-I",
"csISO88598E", "ISO-8859-8-E", "ISO_8859-8-E")
The ISO 8859-8 charset implementation exists in the standard JRE.
However, it is lacking the i/e aliases, which specify whether
bidirectionality is implicit or explicit. The charsets conversions
themselves are similar. This charset complements the standard one.
-
"ISO-8859-6-BIDI" (a.k.a. "csISO88596I", "ISO-8859-6-I", "ISO_8859-6-I",
"csISO88596E", "ISO-8859-6-E", "ISO_8859-6-E")
The ISO 8859-6 charset implementation exists in the standard JRE.
However, it is lacking the i/e aliases, which specify whether
bidirectionality is implicit or explicit. The charsets conversions
themselves are similar. This charset complements the standard one.
-
"KOI8-U" (a.k.a. "KOI8-RU")
The KOI8-U Ukrainian charset, as defined in RFC 2319.
What's New?
in version 1.3:
- Added X-roman8 as an hp-roman8 alias.
- Added the generic EscapedByteLookupCharset to simplify implementation of single-escape-byte charsets.
- Created two flavors of the GSM charset: CCGSMCharset (mapping the Latin capital letter C with cedilla)
and SCGSMCharset (mapping the Latin small letter c with cedilla). See javadocs for details.
- Added support for Packed GSM charset, with the two flavors as well.
- Renamed the canonical charset name for the new GSM family, to make the flavor choices explicit.
in version 1.2.1:
- Fixed a combined JavaMail-JCharset bug that could cause an infinite loop on some inputs.
- Updated the ISO-8859-8-i/e mapping for the MACRON character.
The incorrect mapping in the JDK's implementation of ISO-8859-8 is fixed as of JDK 1.5
(see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4760496). We now determine the
running JDK version, and if it's JDK 1.5 or higher we use the correct mapping. This
way we remain consistent with the running JDK ISO-8859-8 charset implementation.
in version 1.2:
in version 1.1:
- Added ByteLookupCharset class to simplify implementation of single byte charsets.
- Added GSM-default-alphabet charset (used in SMPP).
- Added hp-roman8 charset.
- Added ISO-8859-8-i/e charset.
- Added ISO-8859-6-i/e charset.
in version 1.0:
- This is the first release of the Java Charset package.
License
The JCharset Package is provided under the
GNU General Public License agreement.
For non-GPL commercial licensing please contact the author.
Donate
A lot of hard work, time and effort are put in to provide and maintain this project for free.
If you use it, please help out - show your appreciation and
to continue supporting it.
Contact
you can contact the author via e-mail at:
support@freeutils.net
Please write in to report bugs, problems, suggestions, ideas, questions, answers, source code queries and especially just to let me know you've found JCharset Package useful. Getting feedback will encourage me to continue development and add some advanced features I have in mind...
For updates and additional information, you can always visit the website at
http://www.freeutils.net