Writing - December 2017

1 post from December 2017

โ† Back to 2017

Dec 16, 2017

Emojis and String.Length in C#

Are you using String.Length to compute the length of a string that might include emojis?

If you compute String.Length for such a string, you may not get back exactly what you expect:

var str = "๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿผ๐Ÿผ";
Console.WriteLine(str.Length);    //what do you think will be written?

This will write 12 to the screen. What were YOU expecting?

This happens because C# strings are UTF-16 by default and Unicode characters that fall outside of these 16-bit lengths are stored as surrogate pairs, or two chars that represent one 32-bit character.

However, you may be wanting the number of Unicode characters, not the actual length of the char array, in which case you should use System.Globalization.StringInfo to compute the length of your strings. Like so:

var str = "๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿ‘ถ๐Ÿผ๐Ÿผ";
var stringInfo = new System.Globalization.StringInfo(str);
Console.WriteLine(stringInfo.LengthInTextElements);

This will yield what youโ€™re looking for: 6

Want more reading? Check out Joel Spolskyโ€™s very excellent article on strings and encoding. Remember, there is NO such thing as plain text!