Handle unsupported unicode characters | OPC UA Standard | Forum

Lost password?

Home Forum OPC UA Standard Handle unsupported unicode characte…

Handle unsupported unicode characters

Christoffer Lind

Member

Members

Forum Posts: 4

Member Since:
07/03/2017

Offline

11/16/2020 - 05:07

I have a variable node which uses the String data type. According to the specification string values are “encoded as a sequence of UTF-8 characters”. The thing is that my application only supports the Latin-1 character set. Is it acceptable to only support a subset of all unicode characters and reject write requests which includes unicode characters not within that subset? Would Bad_WriteNotSupported be the prefered result code in that case?

Randy Armstrong

Admin

Forum Posts: 1596

Member Since:
05/30/2017

Offline

11/16/2020 - 22:54

You can treat every string as an opaque sequence of bytes which will allow you to return whatever is written.

UTF-8 is also an 8-bit encoding so it can be treated as a latin-1 string even if it is not.

Supporting UTF8 is not optional for servers. Can you explain more about why this is an issue?

Christoffer Lind

Member

Members

Forum Posts: 4

Member Since:
07/03/2017

Offline

11/17/2020 - 02:01

Storing and returning a sequence of bytes is not an issue. The problem is that the application might end up interpreting/displaying some characters incorrectly as it doesn’t support the UTF-8 character set. The UTF-8 character “水” (\xe6\xb0\xb4) would for example be displayed as something completely else if interpreted as Latin-1 (or more realistcally, the decoding would fail).

I mean, it must be up to the application to perform range checks? For integer values the application must be allowed to reject out of range values and the same must apply for strings, no?

Randy Armstrong

Admin

Forum Posts: 1596

Member Since:
05/30/2017

Offline

11/17/2020 - 06:56

This has not come up before. A mantis would be best.

A write that unexpectedly fails for unknown reasons is not good for IOP. Failures for range checks usually have metadata that explains the behavior like EnumStrings. My inclination it is to say you should figure out some way to handle it within your server.

Randy Armstrong

Admin

Forum Posts: 1596

Member Since:
05/30/2017

Offline

11/17/2020 - 10:21

The UA WG discussed this issue and agreed that failed UTF-8 to LATIN-1 conversion is no different from failing because particular variable does not allow spaces or punctuation. The error code to return is Bad_OutOfRange.

Forum Timezone: America/Phoenix

Most Users Ever Online: 510

Currently Online: Gabor Janak

Guest(s) 34

Currently Browsing this Page:
1 Guest(s)