{"id":725,"date":"2014-12-14T18:02:06","date_gmt":"2014-12-14T23:02:06","guid":{"rendered":"https:\/\/sheldon-hess.org\/coral\/?p=725"},"modified":"2014-12-14T18:03:37","modified_gmt":"2014-12-14T23:03:37","slug":"unicode","status":"publish","type":"post","link":"https:\/\/www.sheldon-hess.org\/coral\/2014\/12\/unicode\/","title":{"rendered":"A little bit about Unicode"},"content":{"rendered":"<p><a href=\"https:\/\/sheldon-hess.org\/coral\/wp-content\/uploads\/2014\/10\/unicoooode.jpeg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/sheldon-hess.org\/coral\/wp-content\/uploads\/2014\/10\/unicoooode-300x143.jpeg\" alt=\"unicoooode\" width=\"300\" height=\"143\" class=\"alignright size-medium wp-image-726\" srcset=\"https:\/\/www.sheldon-hess.org\/coral\/wp-content\/uploads\/2014\/10\/unicoooode-300x143.jpeg 300w, https:\/\/www.sheldon-hess.org\/coral\/wp-content\/uploads\/2014\/10\/unicoooode-250x119.jpeg 250w, https:\/\/www.sheldon-hess.org\/coral\/wp-content\/uploads\/2014\/10\/unicoooode.jpeg 457w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>I thought about writing a really long post about handling Unicode in Python, but, honestly, you should <a href=\"http:\/\/nedbatchelder.com\/text\/unipain.html\">go watch this video<\/a>; that&#8217;s where most of my points would have come from, anyway. (It&#8217;s a great video! It&#8217;s funny and helpful and relevant, whether you use Python 2 or 3. I hope I get to go to PyCon and meet Ned in person and thank him for it!) <\/p>\n<p>If you wonder how I ended up watching that video&mdash;along with several coworkers&mdash;we were doing a lot of metadata parsing, as part of our work on the <a href=\"http:\/\/www.arl.org\/focus-areas\/shared-access-research-ecosystem-share\">SHARE project<\/a>. We were building an alpha version of a notification service for research events (paper publications, dataset releases, etc.). As you&#8217;d imagine, not all of the names of the contributors and items are in ASCII (&#8220;ASCII&#8221; just means &#8220;A-Z, a-z, 0-9, and most punctuation&#8221;); we also get \u00c3\u00a6, \u00c3\u00aa, \u00c4\u00ab, \u00c3\u00b8, \u00c3\u00bc, and sometimes \u00c3\u00bf&mdash;so we needed to support Unicode. As an added complication (in my opinion), while we tried to be fairly compliant with both Python 2 and 3, we were running our code with 2, which assumes everything is ASCII by default.<\/p>\n<p>A couple of us ran into the issues brought up in that video&mdash;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Cargo_cult_programming\">cargo-culting<\/a> a &#8220;u&#8221; in front of our strings and converting things to Unicode all willy-nilly. This helped us reach clarity. <\/p>\n<p>I hope that it will help you reach clarity, too, because Unicode support is important; I&#8217;ll go so far as to say Unicode should be the default, never ASCII, because more people <em>don&#8217;t<\/em> use the ASCII character set than do. <\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I thought about writing a really long post about handling Unicode in Python, but, honestly, you should go watch this video; that&#8217;s where most of my points would have come from, anyway. (It&#8217;s a great video! It&#8217;s funny and helpful and relevant, whether you use Python 2 or 3. I&#8230;<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/www.sheldon-hess.org\/coral\/2014\/12\/unicode\/\">Continue reading<span class=\"screen-reader-text\">A little bit about Unicode<\/span><\/a><\/div>\n","protected":false},"author":3,"featured_media":726,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85,70],"tags":[],"class_list":["post-725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-new-developer","category-programming","entry"],"_links":{"self":[{"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/posts\/725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/comments?post=725"}],"version-history":[{"count":0,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/posts\/725\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/media\/726"}],"wp:attachment":[{"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/media?parent=725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/categories?post=725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sheldon-hess.org\/coral\/wp-json\/wp\/v2\/tags?post=725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}