What Is Whitespace?

In some Java work I needed to scrub some NO-BREAK SPACE characters from some import data. In looking for a command in Java to trim the leading and trailing whitespace, I fell down a Rabbit Hole.

Turns out that the Java String class offers a trim command. But that command has a strange definition of whitespace. Read this blog post by Mike Kaufman for details. The upshot: 'trim' only deletes characters numbered 32 (U+0020, SPACE) and lower.

Then I find an interesting spreadsheet, whitespace? what's that?, listing the various definitions of whitespace in Java and Unicode. There is a lot going on in the nothingness of whitespace!

CharMatcher – Google Guava

Eventually I found a modern, flexible, easy-to-use solution: `CharMatcher` in Google Guava. See their brief guide. By making clever use of Predicate syntax, they make it easy to mix and match various groups of whitespace, invisible, and control characters. You can trim from the front and/or back of a string, replace, and more.

Example usage:

someText = CharMatcher.WHITESPACE.trimFrom( someText );