- All Implemented Interfaces and Traits:
- PostMapConstructorCheckable
@KwrkImmutable
@CompileStatic
class InternationalizedName
extends Object
implements PostMapConstructorCheckable
Represents an Unicode name capable to produce internationalized name.
Internationalized name is produced simply by replacing diacritic characters with their non-diacritic Unicode counterparts. For the majority of diacritic characters, their non-diacritic counterpart
is encoded in Unicode itself. Therefore, such Unicode diacritic characters can be replaced by simple regex matching after decomposing them into characters canonical form containing separate codes
for base character and diacritic. In general this can be accomplished with following code fragment:
String nonDiacriticName = Normalizer.normalize(originalName, Normalizer.Form.NFD).replaceAll(DIACRITIC_MATCHING_PATTERN, "")
Unfortunatelly, there are some diacritic characters that do not have separate code for diacritic. One example is "LATIN SMALL/CAPITAL LETTER D WITH STROKE" (đ/Đ
). For these diacritic
characters additional custom replacement is needed as is implemented in InternationalizedName.getNameInternationalized method.
Some useful references:
- https://web.archive.org/web/20070917051642/http://java.sun.com/mailers/techtips/corejava/2007/tt0207.html#1
- https://docs.oracle.com/javase/8/docs/api/java/text/Normalizer.html
- https://web.archive.org/web/20200329072305/https://www.unicode.org/reports/tr44/#Properties
- https://memorynotfound.com/remove-accents-diacritics-from-string