<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martin @ Blog &#187; Scala</title>
	<atom:link href="http://www.wolkje.net/category/scala/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wolkje.net</link>
	<description>software development and life.</description>
	<lastBuildDate>Sun, 10 Jan 2010 11:18:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Encoding in Scala interpreter</title>
		<link>http://www.wolkje.net/2009/12/30/encoding-in-scala-interpreter/</link>
		<comments>http://www.wolkje.net/2009/12/30/encoding-in-scala-interpreter/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 11:58:05 +0000</pubDate>
		<dc:creator>Martin Sturm</dc:creator>
				<category><![CDATA[English]]></category>
		<category><![CDATA[Scala]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[encoding]]></category>
		<category><![CDATA[utf-8]]></category>

		<guid isPermaLink="false">http://www.wolkje.net/?p=348</guid>
		<description><![CDATA[One of the nice things of Scala is the availability of a command line interpreter based on the REPL principle (Read-evaluate-print loop). Last week, for a particular project, I wanted to generate a string containing a part of the UTF-8 character table.
Thanks to Scala&#8217;s concise syntax, this would not be very difficult:

(0x20AC until 0x20B6).foreach { [...]]]></description>
			<content:encoded><![CDATA[<p>One of the nice things of <a href="http://www.scala-lang.org">Scala</a> is the availability of a command line interpreter based on the REPL principle (Read-evaluate-print loop). Last week, for a particular project, I wanted to generate a string containing a part of the UTF-8 character table.<br />
Thanks to Scala&#8217;s concise syntax, this would not be very difficult:</p>
<pre name="code" class="java">
(0x20AC until 0x20B6).foreach { x => print(x.toChar + " ") }
</pre>
<p>This example will print characters 0&#215;20AC (euro symbol) up to 0&#215;20B6 (an unknown symbol to me <img src='http://www.wolkje.net/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ).</p>
<p>However, the result I got on my system (Mac OS X 10.6.2 using Scala 2.8 nightly) was not really what I expected:</p>
<pre name="code">? ? ? ? ? ? ? ? ? ?</pre>
<p><span id="more-348"></span><br />
Yes, indeed, I only got a list of question marks. Several attempts to solve this problem (writing it to a file, printing it using other conversions, etc.) didn&#8217;t solve this problem. I expected it had something to do with the encoding, but not knowing enough on this subject prevented me from finding the actual problem. I ended up posting a question on <a href="http://stackoverflow.com/questions/1948044/printing-unicode-from-scala-interpreter">Stackoverflow</a>:</p>
<blockquote><p>I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example € or ƒ is printed as a ?.</p>
<p>print(8364.toChar)<br />
results in ? instead of €. Probably I&#8217;m doing something wrong. My terminal supports utf-8 characters and even when I pipe the output to a seperate file and open it in a texteditor, ? is displayed.</p></blockquote>
<p>I got one answer, stating that he (or she?) could not reproduce the problem:</p>
<blockquote><p>Euro&#8217;s codepoint is 0&#215;20AC (or in decimal 8364), and that appears to work for me (I&#8217;m on Linux, on a nightly of 2.8):</p>
<pre name="code" class="java">
scala> print(0x20AC.toChar)
€
</pre>
</blockquote>
<p>So, either there was an issue on Mac OS X or there was a bug in Scala. After a week or so, I still didn&#8217;t found the cause of my problem (of course, my original problem, printing the string containing UTF-8 characters was already solved in a different way). I decided to investigate a bit further. As most of the time, the cause was pretty obvious. Scala uses the system property <tt>file.encoding</tt> to determine which encoding it should use. I posted the &#8217;solution&#8217; on Stackoverflow:</p>
<p>The cause of the problem is the default encoding used by Mac OS X. When you start `scala` interpreter, it will use the default encoding for the specified platform. On Mac OS X, this is Macroman, on Windows it is probably CP1252. You can check this by typing the following command in the scala interpreter:</p>
<pre name="code" class="java">
    scala> System.getProperty("file.encoding");
    res3: java.lang.String = MacRoman
</pre>
<p>According to the <tt>scala</tt> help test, it is possible to provide Java properties using the -D option. However, this does not work for me. I ended up setting the environment variable </p>
<pre>
    JAVA_OPTS="-Dfile.encoding=UTF-8"
</pre>
<p>After running <tt>scala</tt>, the result of the previous command will give the following result:</p>
<pre name="code" class="java">
    scala> System.getProperty("file.encoding")
    res0: java.lang.String = UTF-8
</pre>
<p>Now, printing special characters works as expected:</p>
<pre name="code" class="java">
    print(0x20AC.toChar)
    €
</pre>
<p>So, it is not a bug in Scala, but an issue with default encodings. In my opinion, it would be better if by default UTF-8 was used on all platforms. In my search for an answer if this is considered, I came across a <a href="http://thread.gmane.org/gmane.comp.lang.scala.internals/189">discussion</a> on the Scala mailing list on this issue. In the first message, it is proposes to use UTF-8 by default on Mac OS X when <tt>file.encoding</tt> reports Macroman, since UTF-8 is the default charset on Mac OS X (keeps me wondering why <tt>file.encoding</tt> by defaults is set to Macroman, probably this is an inheritance from Mac OS before 10 was released?). I don&#8217;t think this proposal will be part of Scala 2.8, since Martin Odersky <a href="http://article.gmane.org/gmane.comp.lang.scala.internals/298">wrote</a> that it is probably best to keep things as they are in Java (i.e. honor the <tt>file.encoding</tt> property).</p>
<p>So the best way to prevent this issue, is to set <tt>file.encoding</tt> to UTF-8 using the JAVA_OPTS environment variable which is loaded by default on startup.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wolkje.net/2009/12/30/encoding-in-scala-interpreter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
