{"id":295,"date":"2009-05-12T15:50:49","date_gmt":"2009-05-12T13:50:49","guid":{"rendered":"http:\/\/michauko.org\/blog\/?p=295"},"modified":"2009-11-24T19:47:59","modified_gmt":"2009-11-24T17:47:59","slug":"joujou-avec-les-encodages-8859-1-utf-8-etc","status":"publish","type":"post","link":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/","title":{"rendered":"Joujou avec les encodages 8859-1, UTF-8 etc"},"content":{"rendered":"<p>A l&rsquo;occasion de l&rsquo;\u00e9criture d&rsquo;un script Python pour convertir un CSV-d\u00e9gueu en XML-UTF8, le tout en environnement windows+linux et partant d&rsquo;un CSV issu d&rsquo;un Excel issu de copier-coller d\u00e9gueulasses, j&rsquo;ai eu \u00e0 jouer avec des conversions de charset, de formats UNIX\/Windows etc etc<br \/>\nCa m&rsquo;a permis de d\u00e9couvrir 2\/3 outils, objet de cet article ; je passe sur la multitude d&rsquo;autres probl\u00e8mes de nettoyage du contenu issu du copier-coller : guillemets relook\u00e9s par je ne sais qui (Word ?), tirets relook\u00e9s aussi etc&#8230;<!--more--><\/p>\n<p>Les basiques :<\/p>\n<ul>\n<li>file : permet d&rsquo;identifier un type de fichier, notamment, pour du texte brut, s&rsquo;il est encod\u00e9 en ISO-machin ou en UTF-8. A noter, si le texte en question est \u00e0 l&rsquo;int\u00e9rieur d&rsquo;un script, enfin bref, de quelque chose enrobant ce texte, \u00ab\u00a0file\u00a0\u00bb se limitera \u00e0 d\u00e9tecter le type de script, par exemple. Dans ce cas, extraire le texte en question dans un fichier \u00e0 part (via des \u00ab\u00a0grep\u00a0\u00bb) afin d&rsquo;analyser l&rsquo;encodage de ce texte. J&rsquo;ai pas trouv\u00e9 plus simple<\/li>\n<li>dos2unix : conversion des fins de lignes DOS (2 octets \\010\\013 (ou l&rsquo;inverse) en un seul (\\010 ou l&rsquo;autre, je ne sais jamais :). Si vous \u00eates pass\u00e9s par un transfert FTP type ASCII, c&rsquo;est fait totomatiquement. Mais en SFTP ou autre chose, niet.<\/li>\n<li>unix2dos : devinez<\/li>\n<\/ul>\n<p>Ensuite :<\/p>\n<ul>\n<li>tcs : permet de convertir un fichier encod\u00e9 avec un charset vers un autre, exemple : <code>tcs -f 8859-1 -t utf source > dest<\/code>. Faites des \u00ab\u00a0file\u00a0\u00bb ensuite pour voir.<\/li>\n<li>rxp : valide la syntaxe XML (et l&rsquo;encodage utilis\u00e9) d&rsquo;un fichier XML<\/li>\n<li>od : affiche en hexa, ascii, octal (etc) un fichier. On peut cumuler et mettre en parall\u00e8le l&rsquo;ascii avec l&rsquo;hexa, par exemple<\/li>\n<\/ul>\n<p>Voil\u00e0, c&rsquo;est tout.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A l&rsquo;occasion de l&rsquo;\u00e9criture d&rsquo;un script Python pour convertir un CSV-d\u00e9gueu en XML-UTF8, le tout en environnement windows+linux et partant d&rsquo;un CSV issu d&rsquo;un Excel issu de copier-coller d\u00e9gueulasses, j&rsquo;ai eu \u00e0 jouer avec des &hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8,2,387,82,83],"tags":[201,313,314,312,200],"class_list":["post-295","post","type-post","status-publish","format-standard","hentry","category-coup-de-coeur","category-debian","category-ligne-de-commande","category-pl","category-ubuntu","tag-iso-8859-1","tag-od","tag-rxp","tag-tcs","tag-utf-8"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko\" \/>\n<meta property=\"og:description\" content=\"A l&rsquo;occasion de l&rsquo;\u00e9criture d&rsquo;un script Python pour convertir un CSV-d\u00e9gueu en XML-UTF8, le tout en environnement windows+linux et partant d&rsquo;un CSV issu d&rsquo;un Excel issu de copier-coller d\u00e9gueulasses, j&rsquo;ai eu \u00e0 jouer avec des &hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/\" \/>\n<meta property=\"og:site_name\" content=\"Le blog de Michauko\" \/>\n<meta property=\"article:published_time\" content=\"2009-05-12T13:50:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2009-11-24T17:47:59+00:00\" \/>\n<meta name=\"author\" content=\"michauko\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"michauko\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/\"},\"author\":{\"name\":\"michauko\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/#\\\/schema\\\/person\\\/0cd9f3d9ce4dccc05df81a5b27051ea9\"},\"headline\":\"Joujou avec les encodages 8859-1, UTF-8 etc\",\"datePublished\":\"2009-05-12T13:50:49+00:00\",\"dateModified\":\"2009-11-24T17:47:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/\"},\"wordCount\":296,\"commentCount\":2,\"keywords\":[\"iso-8859-1\",\"od\",\"rxp\",\"tcs\",\"utf-8\"],\"articleSection\":[\"coup de coeur\",\"Debian\",\"ligne de commande\",\"planet-libre.org\",\"Ubuntu\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/\",\"url\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/\",\"name\":\"Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/#website\"},\"datePublished\":\"2009-05-12T13:50:49+00:00\",\"dateModified\":\"2009-11-24T17:47:59+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/#\\\/schema\\\/person\\\/0cd9f3d9ce4dccc05df81a5b27051ea9\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/michauko.org\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Joujou avec les encodages 8859-1, UTF-8 etc\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/michauko.org\\\/blog\\\/\",\"name\":\"Le blog de Michauko\",\"description\":\"Si tu ne comprends pas le titre de l&#039;article, passe ton chemin\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/michauko.org\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/michauko.org\\\/blog\\\/#\\\/schema\\\/person\\\/0cd9f3d9ce4dccc05df81a5b27051ea9\",\"name\":\"michauko\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g\",\"caption\":\"michauko\"},\"sameAs\":[\"http:\\\/\\\/michauko.org\\\/\"],\"url\":\"https:\\\/\\\/michauko.org\\\/blog\\\/author\\\/randomized2\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/","og_locale":"fr_FR","og_type":"article","og_title":"Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko","og_description":"A l&rsquo;occasion de l&rsquo;\u00e9criture d&rsquo;un script Python pour convertir un CSV-d\u00e9gueu en XML-UTF8, le tout en environnement windows+linux et partant d&rsquo;un CSV issu d&rsquo;un Excel issu de copier-coller d\u00e9gueulasses, j&rsquo;ai eu \u00e0 jouer avec des &hellip;","og_url":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/","og_site_name":"Le blog de Michauko","article_published_time":"2009-05-12T13:50:49+00:00","article_modified_time":"2009-11-24T17:47:59+00:00","author":"michauko","twitter_card":"summary_large_image","twitter_misc":{"\u00c9crit par":"michauko","Dur\u00e9e de lecture estim\u00e9e":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/#article","isPartOf":{"@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/"},"author":{"name":"michauko","@id":"https:\/\/michauko.org\/blog\/#\/schema\/person\/0cd9f3d9ce4dccc05df81a5b27051ea9"},"headline":"Joujou avec les encodages 8859-1, UTF-8 etc","datePublished":"2009-05-12T13:50:49+00:00","dateModified":"2009-11-24T17:47:59+00:00","mainEntityOfPage":{"@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/"},"wordCount":296,"commentCount":2,"keywords":["iso-8859-1","od","rxp","tcs","utf-8"],"articleSection":["coup de coeur","Debian","ligne de commande","planet-libre.org","Ubuntu"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/","url":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/","name":"Joujou avec les encodages 8859-1, UTF-8 etc - Le blog de Michauko","isPartOf":{"@id":"https:\/\/michauko.org\/blog\/#website"},"datePublished":"2009-05-12T13:50:49+00:00","dateModified":"2009-11-24T17:47:59+00:00","author":{"@id":"https:\/\/michauko.org\/blog\/#\/schema\/person\/0cd9f3d9ce4dccc05df81a5b27051ea9"},"breadcrumb":{"@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/michauko.org\/blog\/joujou-avec-les-encodages-8859-1-utf-8-etc-295\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/michauko.org\/blog\/"},{"@type":"ListItem","position":2,"name":"Joujou avec les encodages 8859-1, UTF-8 etc"}]},{"@type":"WebSite","@id":"https:\/\/michauko.org\/blog\/#website","url":"https:\/\/michauko.org\/blog\/","name":"Le blog de Michauko","description":"Si tu ne comprends pas le titre de l&#039;article, passe ton chemin","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/michauko.org\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Person","@id":"https:\/\/michauko.org\/blog\/#\/schema\/person\/0cd9f3d9ce4dccc05df81a5b27051ea9","name":"michauko","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5c3a8969c185fd0eef3893a15408f3ef1b36a6681a066b1eb32045643c30ba65?s=96&d=mm&r=g","caption":"michauko"},"sameAs":["http:\/\/michauko.org\/"],"url":"https:\/\/michauko.org\/blog\/author\/randomized2\/"}]}},"_links":{"self":[{"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/posts\/295","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/comments?post=295"}],"version-history":[{"count":4,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/posts\/295\/revisions"}],"predecessor-version":[{"id":975,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/posts\/295\/revisions\/975"}],"wp:attachment":[{"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/media?parent=295"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/categories?post=295"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michauko.org\/blog\/wp-json\/wp\/v2\/tags?post=295"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}