Discussion:
How do I read and write an iso-8859-1 file in Emacs 23?
Alan Mackenzie
2010-03-28 20:43:51 UTC
Permalink
Hi, everybody,

the subject just about says everything. Emacs 23 insists on fouling up
my text, converting (for example) ü ("u umlaut") into \374 each time I
try to save it. It then complains it can't save \374 because it can't
"convert" it.

In desperation, I tried putting this on the first line of the text:

-*- mode : Text ; buffer-file-coding-system : iso-8859-1-unix -*-

. Should this help? Is it causing me problems?

I've tried reading the fine manual. It helps me not in the slightest.
What am I missing here? All I want to do is read an 8859-1 text file,
edit it, and write it back again. How do I tell Emacs that an 0xFC
character in the file is actually a "u umlaut", and not anything else.
Why is Emacs insisting on trying to be so clever?
--
Alan Mackenzie (Nuremberg, Germany).
Peter Dyballa
2010-03-28 21:11:41 UTC
Permalink
Post by Alan Mackenzie
-*- mode : Text ; buffer-file-coding-system : iso-8859-1-unix -*-
I prefer a simple "coding: iso-8859-15;" – besides this you can check
the Options -> Mule -> Set Coding Systems menu (with short-cuts).

--
Mit friedvollen Grüßen

Pete

"A TRUE Klingon warrior does not comment his code."
Eli Zaretskii
2010-03-29 06:33:13 UTC
Permalink
Date: Sun, 28 Mar 2010 20:43:51 +0000
=20
the subject just about says everything.
It is strange to read such questions in the year 2010 regarding Emacs
23.
Emacs 23 insists on fouling up my text, converting (for example) =
=FC
("u umlaut") into \374 each time I try to save it. It then
complains it can't save \374 because it can't "convert" it.
What does Emacs tell about this character when you type "C-u C-x =
=3D"
with point on the =FC (before it is converted to \374)? Also, how di=
d
you insert that character into the buffer?

I suspect that something causes Emacs to treat it as a raw byte \374,
rather than a Latin-1 character. (Yes, Emacs can distinguish between
these two.)
=20
-*- mode : Text ; buffer-file-coding-system : iso-8859-1-unix -=
*-
=20
. Should this help?
Yes. But it shouldn't be needed in most situations.
Is it causing me problems?
It shouldn't.
What am I missing here? All I want to do is read an 8859-1 text fi=
le,
edit it, and write it back again. How do I tell Emacs that an 0xFC
character in the file is actually a "u umlaut", and not anything el=
se.

If you have this trouble in a file you visited and did not modify yet=
,
it could be that the file includes some raw bytes that don't fit any
encoding known to Emacs, or perhaps Emacs detected the encoding
incorrectly. What does `buffer-file-coding-system' evaluate to in
this buffer, immediately after you visit the file?
Why is Emacs insisting on trying to be so clever?
Because it's Emacs ;-)
Alan Mackenzie
2010-03-30 10:42:22 UTC
Permalink
Hi, Eli,
Post by Eli Zaretskii
Date: Sun, 28 Mar 2010 20:43:51 +0000
the subject just about says everything.
It is strange to read such questions in the year 2010 regarding Emacs
23.
I feel that Emacs 23 is less stable in this respect than Emacs 22.
Post by Eli Zaretskii
Emacs 23 insists on fouling up my text, converting (for example) ü
("u umlaut") into \374 each time I try to save it. It then
complains it can't save \374 because it can't "convert" it.
What does Emacs tell about this character when you type "C-u C-x ="
with point on the ü (before it is converted to \374)? Also, how did
you insert that character into the buffer?
My buffer is now doing the Right Thing, both displaying a ü ("u umlaut")
as it should be, and saving it correctly as the single byte 0xfc.
Previously, it was sometimes being displayed as "\374" as I typed. I
don't know exactly what I did to achieve this; I'm thoroughly confused
about it.

To insert the ü, I typed a key-combination programmed to generate 0xFC
on a Linux virtual terminal.
Post by Eli Zaretskii
I suspect that something causes Emacs to treat it as a raw byte \374,
rather than a Latin-1 character. (Yes, Emacs can distinguish between
these two.)
-*- mode : Text ; buffer-file-coding-system : iso-8859-1-unix -*-
. Should this help?
Yes. But it shouldn't be needed in most situations.
I've since removed it.
Post by Eli Zaretskii
Is it causing me problems?
It shouldn't.
Thanks!
Post by Eli Zaretskii
What am I missing here? All I want to do is read an 8859-1 text file,
edit it, and write it back again. How do I tell Emacs that an 0xFC
character in the file is actually a "u umlaut", and not anything else.
If you have this trouble in a file you visited and did not modify yet,
it could be that the file includes some raw bytes that don't fit any
encoding known to Emacs, or perhaps Emacs detected the encoding
incorrectly. What does `buffer-file-coding-system' evaluate to in
this buffer, immediately after you visit the file?
I've lost that info, now. It was probably raw-text or no-translation
(whatever the difference is between these two).
Post by Eli Zaretskii
Why is Emacs insisting on trying to be so clever?
Because it's Emacs ;-)
Ah, OK!
--
Alan Mackenzie (Nuremberg, Germany).
Andreas Röhler
2010-03-30 11:33:44 UTC
Permalink
Post by Alan Mackenzie
Hi, Eli,
Post by Eli Zaretskii
Date: Sun, 28 Mar 2010 20:43:51 +0000
the subject just about says everything.
It is strange to read such questions in the year 2010 regarding Emacs
23.
I feel that Emacs 23 is less stable in this respect than Emacs 22.
Post by Eli Zaretskii
Emacs 23 insists on fouling up my text, converting (for example) ü
("u umlaut") into \374 each time I try to save it. It then
complains it can't save \374 because it can't "convert" it.
What does Emacs tell about this character when you type "C-u C-x ="
with point on the ü (before it is converted to \374)? Also, how did
you insert that character into the buffer?
My buffer is now doing the Right Thing, both displaying a ü ("u umlaut")
as it should be, and saving it correctly as the single byte 0xfc.
Previously, it was sometimes being displayed as "\374" as I typed. I
don't know exactly what I did to achieve this; I'm thoroughly confused
about it.
That's a very old, known issue. Reported it years ago.
As it happens seldom, I'm able to live with.

It happens sometimes, if text is pasted from an email.

Than umlauts are displayed as (their) numbers.

Workaround is to mark the whole buffer, copy it into another one.
In the next buffer umlauts are shown correctly.

Cheers


Andreas

--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/
Post by Alan Mackenzie
To insert the ü, I typed a key-combination programmed to generate 0xFC
on a Linux virtual terminal.
Post by Eli Zaretskii
I suspect that something causes Emacs to treat it as a raw byte \374,
rather than a Latin-1 character. (Yes, Emacs can distinguish between
these two.)
-*- mode : Text ; buffer-file-coding-system : iso-8859-1-unix -*-
. Should this help?
Yes. But it shouldn't be needed in most situations.
I've since removed it.
Post by Eli Zaretskii
Is it causing me problems?
It shouldn't.
Thanks!
Post by Eli Zaretskii
What am I missing here? All I want to do is read an 8859-1 text file,
edit it, and write it back again. How do I tell Emacs that an 0xFC
character in the file is actually a "u umlaut", and not anything else.
If you have this trouble in a file you visited and did not modify yet,
it could be that the file includes some raw bytes that don't fit any
encoding known to Emacs, or perhaps Emacs detected the encoding
incorrectly. What does `buffer-file-coding-system' evaluate to in
this buffer, immediately after you visit the file?
I've lost that info, now. It was probably raw-text or no-translation
(whatever the difference is between these two).
Post by Eli Zaretskii
Why is Emacs insisting on trying to be so clever?
Because it's Emacs ;-)
Ah, OK!
Continue reading on narkive:
Loading...