[Date Prev][Date Next][Thread Prev][Thread Next][Thread Index]

Re: [XaraXtreme-dev] Patches for Xara: Browser patch, --help patch, "?" -> "(c)" patch, some de_DE translation work


Tobias Burnus wrote:
Hi Alex,

Alex Bligh schrieb:
Possibly. The input encoding for the .po file is ISO-8859-1 - that may
be different from the output encoding.
Hmm. But  see:
XaraLX/po> grep -i charset XaraLX.po
"Content-Type: text/plain; charset=UTF-8\n"

If I write a character in UTF-8 encoding, do msgfmt and use copy the .mo
file to the locale directory,
it shows up nicely. I have a "de_DE.UTF-8" locale though.

OK. I think we are at cross purposes. Input encoding is (as far as I
can tell from the xgettext manual) the encoding of the C source file
xgettext expects to be working on. I am far from an expert in this
area but I think ISO-8859-1 is right, as I don't think C source
can be in UTF-8.

What you are looking at above is the output encoding for xgettext
which I just left as the default.

The confusion here is "input to xgettext" - both the msgid and
the msgstr are outputs from xgettext.

What happens if you translate them when they use © (or whatever)
Well, since there is nothing which translates this back, one gets (in
the menu line at least)
where the hash is underlined and the ; is regarded as separator.

Sorry, by "when they use ©" I meant "when the .xrc files
use escaped characters", for instance

  <label>Copyright &#169; 1994-2005 Xara Group Ltd.</label>

What should happen here is that either wxrc or build-resources.pl
itself translates these to a copyright symbol (in ISO-8859-1
encoding) for input into xgettext. That should produce a (UTF8)
msgid and a (UTF8) msgstr, assuming UTF8 is the default output

The strings get into the application through dialogs.xrc and
strings.lst which remains ISO-8859-1 (i.e. 8 bit), and are
escaped on loading either by CamResource or by the wxWidgets
xrc loader. It will then translate these escaped sequences into
Unicode strings. These should (hopefully) match the UTF-8 specified
strings in the po file.

I couldn't get UTF-8 input encoding to work  (see line to xgettext).
Sorry, I don't find your reference to xgettext. Neither in the emails
not in the source, but probably I just missed the right email.

It's in build-resources.pl. I think it didn't work as I wouldn't
expect UTF-8 /input/ encoding to work (as the C that xgettext
expects is just 8 bit ISO-8859-1 - of course it's not really C,
it's text that build-resources.pl or wxrc writes on the fly based
on decoding the .xrc files that look like the C xgettext expects).

But maybe only for the extraction it makes problems and not  for using
it in a po/mo file?

What should happen (and the only case I am concerned about) is that
you can specify (for instance) a copyright symbol in ANY string
in xrc, e.g. by putting &#169; in the .xrc source. You should
then in the po file be able to translate them like this

msgid "(c) Some test text in untranslated english"
msgstr "(c) Some test text in Klingon"

Where "(c)" is whatever the UTF-8 encoding for a copyright symbol is,
in both cases

What I was really asking you to do (but it was rather poorly phrased)
was go open up an .xrc file you are translating, stick a copyright
symbol (using &#169;) into a string you have already translated.
It should then appear untranslated (as the msgid won't match),
hopefully with a copyright symbol in. Then rebuild the .pot
file (so your msgid now has a UTF-8 copyright symbol in), and try
inserting the same character into the msgstr. Check you now
have a translated string still with a copyright symbol in. If so,
then we are Unicode clean on text to be translated, as well as
translated text.