Problem with Firefox3 and UFT8 java.io.UTFDataFormatException

I have a problem with Firefox3 (newest Version RC2). Other Browsers (FF2, IE6, IE7) working fine.

When the user inserts some german umlauts (e.g. ü, ä or ö) in a textfield the session crashes with the standard error message "An application error has occurred. Your session has been reset.".
The correspondig entry in my logfile is:

05.06.2008 16:22:52 org.apache.catalina.core.StandardWrapperValve invoke
SCHWERWIEGEND: Servlet.service() for servlet oscxpertServlet threw exception
java.io.IOException: Provided InputStream cannot be parsed: java.io.UTFDataFormatException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at nextapp.echo2.webrender.service.SynchronizeService.parseRequestDocument(SynchronizeService.java:191)
        at nextapp.echo2.webrender.service.SynchronizeService.service(SynchronizeService.java:264)
        at nextapp.echo2.webrender.WebRenderServlet.process(WebRenderServlet.java:273)
        at nextapp.echo2.webrender.WebRenderServlet.doPost(WebRenderServlet.java:189)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)

It's interesting that the exception differs depending on the character which is used.
Providing an
-ü: Invalid byte 1 of 1-byte UTF-8 sequence
-ä: Invalid byte 2 of 3-byte UTF-8 sequence
-ö: Invalid byte 2 of 4-byte UTF-8 sequence

I found a entry in mozillas bug database: https://bugzilla.mozilla.org/show_bug.cgi?id=397836 which could fit to this problem. But the entry is from 2007 and still untouched.

So is it a bug in FF3 or in Echo2?

Thanks!
Christian

I had seen a similar problem

I had seen a similar problem with FF3 under ubuntu 8.04 and Echo3 also under Ubuntu 8.04.

I had however not investigated this so far...

tliebeck's picture

Haven't investigated it yet,

Haven't investigated it yet, but am definitely seeing the problem in Ubuntu 8.04/FF3. Opera on same OS works fine. Latest FF2 on XP works fine. This is against the "basic components" slide of http://demo.nextapp.com.

No, it's not Ubuntu

I've forgot to mention, I'm working with Windows XP.
I had tested german umlauts with IE6, IE7, FF2 & FF3 in XP.

But the error occurs only with FF3, the other Browsers (including FF2) are working fine.

This problem isn't very urgent because my customers are using IE6 & IE7 (the first time, I'm happy about that issue ;) ).

Can you provide the client

Can you provide the client messages generated by Firefox 2 (which works) and the RC you're testing? Client message can be found by appending ?debug in the URL. I suppose this is also an issue with echo3?

Niels

The issue also ONLY with

The issue also ONLY with FF3.
It happens with echo3 as well, under Windows XP and Ubuntu 8.04.

For me it realy looks like a FF3 bug.

Is did trace the request

Is did trace the request sent from FF3 to the server.

It has the following header:

content-type: text/xml; charset=ISO-8859-1

The XML parser on the serverside then trys to interprete this as UTF-8
and of course fails.

I assume that the clientside part of echo2/3 does not specify what encoding the xml requests are sent, and we are then on mercy of the browser to use UTF-8 (or in the case of FF3 NOT).

No idea in which .js file this is handled...

Indeed, the solution is

Indeed, the solution is probably to explicitly specify UTF-8 as encoding. Don't know why the server does not honor the character set though.

That might well be something

That might well be something who depend on the app-server used.

We use tomcat 5.5.2x under Windows XP (for development) and Linux (For production and some development)

Ok, here we go. This is the

Ok, here we go.

This is the client message generated by Firefox 2:

Content Length: 467 bytes; Line Count: 13
<client-message trans-id="1" focus="c_12">
<message-part processor="EchoPropertyUpdate">
<property component-id="c_4" name="zIndex" value="1"/>
<property component-id="c_10" name="text">
ö
</property>
<property component-id="c_10" name="horizontalScroll" value="0"/>
<property component-id="c_10" name="verticalScroll" value="0"/>
</message-part>
<message-part processor="EchoAction">
<action component-id="c_7" name="click"/>
</message-part>
</client-message>

And here's the one generated by Firefox 3

Content Length: 288 bytes; Line Count: 9
<client-message>
<message-part processor="EchoPropertyUpdate">
<property component-id="c_10" name="text">
ö
</property>
<property component-id="c_10" name="horizontalScroll" value="0"/>
<property component-id="c_10" name="verticalScroll" value="0"/>
</message-part>
</client-message>

Thanks for the "?debug" hint. I didn't knew that.

Found some additional info

Hi,

When investigating this, I found a couple of interesting links:

http://groups.google.com/group/mozilla.dev.tech.xml/browse_thread/thread/f2190b32ff6a5ede
http://developer.mozilla.org/en/docs/XMLHttpRequest
https://bugzilla.mozilla.org/show_bug.cgi?id=431701

I think they will fix this for the final release, if not, we might need to make changes to our us of XmlHttpRequest and/or DomDocument.

Niels

BTW, the forums seem to have lost some posts of the last few days, Tod?

My workaround

Hi,

I've run into this also, and my (temporary) workaround was to modify the method parseRequestDocument of SynchronizeService adding something like:

if ("ISO-8859-1".equalsIgnoreCase(request.getCharacterEncoding()) && userAgent != null && userAgent.indexOf("Firefox/3") != -1) {
    ByteArrayOutputStream byteOut = new ByteArrayOutputStream();

    byte[] buffer = new byte[4096];
    int bytesRead = 0;

    try {
        do {
            bytesRead = in.read(buffer);
            if (bytesRead > 0) {
                byteOut.write(buffer, 0, bytesRead);
            }
        } while (bytesRead > 0);
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException ex) {
            }
        }
    }

    in.close();

    byte[] data = byteOut.toByteArray();
    data = new String(data, request.getCharacterEncoding()).trim().getBytes("utf-8");

    return DomUtil.getDocumentBuilder().parse(new ByteArrayInputStream(data));
}

basically I'm re-encoding the request as UTF-8.
This is similar to what is done to the request if it's coming from Konqueror.

Nadir's picture

Another workaround

Without having to patch the Echo source I implemented a servlet filter which reencodes requests which have the "wrong" encoding

The HeaderControlFilter:

public class HeaderControlFilter implements Filter {
protected String reencoding = null;
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest)request;
HttpServletResponse httpResponse = (HttpServletResponse)response;
if(reencoding!=null && request.getCharacterEncoding()!=null && !reencoding.equals(request.getCharacterEncoding())) {
request = new HeaderControlRequest(httpRequest,reencoding);
}
chain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
this.reencoding = filterConfig.getInitParameter("request.reencoding");
}
}

The HeaderControlRequest wrapper:

public class HeaderControlRequest extends HttpServletRequestWrapper {
String encoding;
ServletInputStream is;
public HeaderControlRequest(HttpServletRequest request, String encoding) {
super(request);
this.encoding = encoding;
}
public ServletInputStream getInputStream() throws IOException {
if(is==null) {
is = new InputStreamReencoder(super.getInputStream(), super.getCharacterEncoding(), encoding);
}
return is;
}
}

The InputStreamReencoder:

public class InputStreamReencoder extends ServletInputStream {
ByteArrayInputStream is;
public InputStreamReencoder(ServletInputStream in, String inEncoding, String outEncoding) {
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int bytesRead = 0;
try {
do {
bytesRead = in.read(buffer);
if (bytesRead > 0) {
byteOut.write(buffer, 0, bytesRead);
}
} while (bytesRead > 0);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (in != null) {
try {
in.close();
} catch (IOException ex) {
// Ignore
}
}
}
byte[] data = byteOut.toByteArray();
try {
data = new String(data, inEncoding).trim().getBytes(outEncoding);
is = new ByteArrayInputStream(data);
} catch (UnsupportedEncodingException e) {
// Ignore
}
}
public int read() throws IOException {
return is.read();
}
}

Add the following to web.xml:

<filter>
<filter-name>headercontrol</filter-name>
<filter-class>HeaderControlFilter</filter-class>
<init-param>
<param-name>request.reencoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>headercontrol</filter-name>
<url-pattern>/echoservlet</url-pattern>
</filter-mapping>

I just downloaded the final

I just downloaded the final FF3, and the problem still exists.
So I assume this has not been changed in Firefox, so we will have to fix it on echo side.

Produced a workaround

Hi,

This is indeed still an issue in the official Firefox release. I have produced a workaround based on issues https://bugzilla.mozilla.org/show_bug.cgi?id=431701 and https://bugzilla.mozilla.org/show_bug.cgi?id=407213#c8.
This workaround will only be applied for Firefox 3.0 browsers. This issue will probably be fixed by Mozilla in one of the next updates. This workaround forces Firefox to deal with UTF-8 and saves resources on the server.

Technically speaking, Echo does not honor a browser's request for a specific encoding. This could be solved by implementing some kind of re-encoding as suggested earlier in this thread.

Niels

P.S. diff is based on SVN 1207

tliebeck's picture

Sorry for lag on this, yes,

Sorry for lag on this, yes, patch looks "good", I'll go ahead and commit it. (btw, you guys do have Core.js commit access over there, same SVN passwords, just remember to commit any CoreJS changes to CoreJS first and then Echo3 repos).

Ok, thanks. I must have

Ok, thanks. I must have missed the move to the corejs project. Sounds rather risky, keeping two different versions of the same file in SVN. Would it be possible to let echo3 just depend on corejs?

Niels

tliebeck's picture

Would prefer to do that, but

Would prefer to do that, but haven't quite sorted how to it would work. I really like the ability to play with these files in Echo3 and quickly Ctrl+F11 to run them. Also want people to be able to quickly compile Echo3 with as little dependency issue as possible. There are ways around these issues, but almost think just keeping a copy in Echo3 SVN is the easiest solution.

Hello, do you have a

Hello,

do you have a timeframe when it's included ? (It isn't in the echogo build of this night (24.July.2008))
Otherwise I will have to go with the filter solution.

André

What about Echo2?

As this patch is in high demand, is there any chance we can get that into Echo2 as well?

tnx, Chris

Hi Tod, I will be on

Hi Tod,

I will be on vacation for the next two weeks and haven't been able to commit the patch. I was under the impression that "I'll go ahead and commit it" meant that you would commit it :) Anyway, if you could apply the patch that would make a lot of people happy I think.

Cheers,

Niels

tliebeck's picture

Whoops, sorry about this,

Whoops, sorry about this, thought this had been committed a while ago. Should be in there now.

Firefox 3 and Echo2

Hi Tod,

did you fix echo2 in the same way ?

This bug is a big problem for my apps in echo2 and in the moment i cannot switch to
echo3 because intensive use of echopointng.

Thanks

Martin

Echo2

I would also be very glad, if you could fix this problem in Echo2.
It should be treated as a serious bug.

Echo2 Patch also needed

I agree with this. A lot of stable applications might be using Echo2 and might not have the time to migrate to Echo3 soon.
For example for French people, who always use é è à ù caracters, this bug is a major issue as it crash applications very often.

I really appreciate Echo2 framework and I hope that it will be treated as a priority.

what is the up to date situation of this problem?

I've just downloaded lastest jars (Echo3Go) and still same error by using "ü":

Servlet.service() for servlet StartServlet threw exception
java.io.IOException: Provided InputStream cannot be parsed: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
	at nextapp.echo.webcontainer.util.XmlRequestParser.parse(XmlRequestParser.java:107)
	at nextapp.echo.webcontainer.InputProcessor.process(InputProcessor.java:105)
	at nextapp.echo.webcontainer.Synchronization.process(Synchronization.java:64)
	at nextapp.echo.webcontainer.service.SynchronizeService.service(SynchronizeService.java:71)
	at nextapp.echo.webcontainer.WebContainerServlet.process(WebContainerServlet.java:370)
	at nextapp.echo.webcontainer.WebContainerServlet.doPost(WebContainerServlet.java:291)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
	.....
        .....
	at java.lang.Thread.run(Unknown Source)

I just checked the Echo3Go

I just checked the Echo3Go jars, the fix is there. What's the exact FF version you're using? I tested this with 3.0.1 (latest official). Are you sure that you have only one jar on your classpath? Open your app in the browser and go to [AppURL]?sid=Echo.Boot, locate the Core.Web.Dom.createDocument() method and check if there's a separate code path for FF 3.0 browsers.

Please share your findings,

Niels

Estimated FF fix in 3.0.2 and 3.1

The fix for Firefox is estimated to land in Firefox 3.0.2 and Firefox 3.1, you can try the latest builds @ http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.0.2-candidates/ and http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ to see if it is fixed. If you're still having issues, we could dive into this problem some more. In any case, please report back here.

I'll try to check the latest Echo3Go jars to see why it doesn't work yet.

Niels

doesn't work with Echo2Go and Firefox 3.0.2

I just tried it with the latest version of Firefox 3.0.2 and the latest Echo2Go jars. The problem still exists!

Tobias

I just checked the Echo2 svn

I just checked the Echo2 svn log, and it seems my patch was never applied. Please go ahead and file a bug @ bugs.nextapp.com. If you can't wait, you could mold the patch into Echo2, it shouldn't be too hard.

Niels

Patch

Hi,
I rewrote SynchronizeService.java of webrender for fixing this problem. Many thanks to jnelas for the hint to an adequate solution. I think if Echo2 source code cares about Konquerer problems, it should of course take care about bugs at a major browser like Firefox.

Attached you will find the diff file. Its source file is taken from Echo2.1.0_rc4.

edit: the changed source code applies to FF (and the Minefield) 3.0.* as I could see no problem with Minesfield 3.1. Boris Zbarsky promised the fix will be shipped with FF3.0.4. So, if this is verified (check Nighly Build) I will provide another patch to change the regular expression used to identify the FF version.