Discussion:
Generate random char (and string) from unicode category (e.g: letter)
Alexandre Garreau
2018-12-06 10:44:38 UTC
Permalink
Hi,

I recall clearly having wrote in elisp something to generate random and
more-or-less plausible input for gmail account creation form, including
ascii chars for login, statistical randomness for gender (like, iirc,
48% of “male”, 52% of “female”, minus 2% of “others”), and random
unicode for password, real name, etc. I recall in the end I ended with a
lot of ideograms in those. So I know it’s doable in pure elisp (or
maybe was it guile? less likely…).

I really don’t recall how I did that, nor if I took care of using a
single script for each form input, but I’m sure I was using something
less ugly than currently, that is, (random (max-char)) until it matches
[[:alpha:]] (but I clearly recall using something that would work for
all unicode, including foreign scripts I wouldn’t even know about).

Do you have an idea of something cleaner? currently I have this:

#+BEGIN_SRC emacs-lisp
(defun random-letter (&rest osef)
(let ((num (random (max-char))))
(until (string-match "[[:alpha:]]" (string num))
(setq num (random (max-char))))
num))
#+END_SRC

and use it like this:

#+BEGIN_SRC emacs-lisp
(apply #'string (mapcar #'random-letter (make-list (1+ (random 190)) nil)))
#+END_SRC

Problems is I get stuff like this: "䯩繩ꏴ跾ಾ𢉰𐎕𘓅𪆶矏ᄬ𣒈⳰𨜄𛰸𧅌煂𢙴𧐁𡚯
𦅉ᤋ𡇼钇꿚㱓㗧𩅍姵爠𣑽𠌤ꇊ𡘄𑄇𫲘𪯋𣊚𦉂𠦵𘕋𠈾ლ𨟇𦷕𤃻𫿡𢿟巙𩿊𥖠𒒈ባ𗆘𤧟𗲀𔗸𖼂뺔
𧸋𡠜𬶟咨발𬞗쏊紋䲁坮𠢥旼𗴟𬓏𤁍គ𩍏Ɉ𪅊𤙬𫪃𫴛𤶋𫴃𧐨䞪𩇨𡤦馲𨂧𡮃𓂅𒇵𤉴𥙯藣ბ솇
𨆬𦄎靔𐤒ඐ𒐕襋𬵝𥤄𪃝𫈹𨣼𘋹돃𪣞筛𣯿휈𥽊Ꭾ𣐧𥺒𠊆ꮩ闭ઉ𦻸𨔆𤛢𢮁𤟩𪊕𥫰𪢟𡻋𘅇ᶗ펙𣄽
玽쭻𩿎𗔥𪟉䪁ᣃ쒝𩅑𡞬넒煮ڒꫥ𥾴𣁫𬑼깙𣫖筁ᣯ𣱮𡡯𨐒ﶒ𤑳𤯼昵䊘㝓𣑼𐦥𥆤갛𤡇𠜠𥉡䋯𥫻
𪲩兀𖠹瀖𣊫𨘥𢍪ᴢ", where the majority of characters are non-displayable
and are shown with a square with numbers in it to indicate there’s
nothing such that in installed fonts. I clearly recall I what I did
there were no such characters, so something must be possible (maybe
using charsets?).

PS: is there a way to get something else than linear random distribution
with `random'? like normal law, or logarithmic distributi
Eli Zaretskii
2018-12-06 11:24:00 UTC
Permalink
Date: Thu, 06 Dec 2018 11:44:38 +0100
#+BEGIN_SRC emacs-lisp
(defun random-letter (&rest osef)
(let ((num (random (max-char))))
(until (string-match "[[:alpha:]]" (string num))
(setq num (random (max-char))))
num))
#+END_SRC
#+BEGIN_SRC emacs-lisp
(apply #'string (mapcar #'random-letter (make-list (1+ (random 190)) nil)))
#+END_SRC
Problems is I get stuff like this: "䯩繩ꏴ跾ಾ𢉰𐎕𘓅𪆶矏ᄬ𣒈⳰𨜄𛰸𧅌煂𢙴𧐁𡚯
𦅉ᤋ𡇼钇꿚㱓㗧𩅍姵爠𣑽𠌤ꇊ𡘄𑄇𫲘𪯋𣊚𦉂𠦵𘕋𠈾ლ𨟇𦷕𤃻𫿡𢿟巙𩿊𥖠𒒈ባ𗆘𤧟𗲀𔗸𖼂뺔
𧸋𡠜𬶟咨발𬞗쏊紋䲁坮𠢥旼𗴟𬓏𤁍គ𩍏Ɉ𪅊𤙬𫪃𫴛𤶋𫴃𧐨䞪𩇨𡤦馲𨂧𡮃𓂅𒇵𤉴𥙯藣ბ솇
𨆬𦄎靔𐤒ඐ𒐕襋𬵝𥤄𪃝𫈹𨣼𘋹돃𪣞筛𣯿휈𥽊Ꭾ𣐧𥺒𠊆ꮩ闭ઉ𦻸𨔆𤛢𢮁𤟩𪊕𥫰𪢟𡻋𘅇ᶗ펙𣄽
玽쭻𩿎𗔥𪟉䪁ᣃ쒝𩅑𡞬넒煮ڒꫥ𥾴𣁫𬑼깙𣫖筁ᣯ𣱮𡡯𨐒ﶒ𤑳𤯼昵䊘㝓𣑼𐦥𥆤갛𤡇𠜠𥉡䋯𥫻
𪲩兀𖠹瀖𣊫𨘥𢍪ᴢ", where the majority of characters are non-displayable
and are shown with a square with numbers in it to indicate there’s
nothing such that in installed fonts. I clearly recall I what I did
there were no such characters, so something must be possible (maybe
using charsets?).
If you don't want characters that need fancy fonts, why do you use
max-char? Why not a smaller value, like 255?
Ben Bacarisse
2018-12-06 13:46:27 UTC
Permalink
Post by Alexandre Garreau
I recall clearly having wrote in elisp something to generate random and
more-or-less plausible input for gmail account creation form, including
ascii chars for login, statistical randomness for gender (like, iirc,
48% of “male”, 52% of “female”, minus 2% of “others”), and random
unicode for password, real name, etc. I recall in the end I ended with a
lot of ideograms in those. So I know it’s doable in pure elisp (or
maybe was it guile? less likely…).
I really don’t recall how I did that, nor if I took care of using a
single script for each form input, but I’m sure I was using something
less ugly than currently, that is, (random (max-char)) until it matches
[[:alpha:]] (but I clearly recall using something that would work for
all unicode, including foreign scripts I wouldn’t even know about).
#+BEGIN_SRC emacs-lisp
(defun random-letter (&rest osef)
(let ((num (random (max-char))))
(until (string-match "[[:alpha:]]" (string num))
(setq num (random (max-char))))
num))
#+END_SRC
#+BEGIN_SRC emacs-lisp
(apply #'string (mapcar #'random-letter (make-list (1+ (random 190)) nil)))
#+END_SRC
Problems is I get stuff like this: "䯩繩ꏴ跾ಾ𢉰𐎕𘓅𪆶矏ᄬ𣒈⳰𨜄𛰸𧅌煂𢙴𧐁
You might find char-displayable-p useful. It returns 'unicode for those
numbered character for which there is no configured font. It returns t
for others but that includes control characters.

describe-char-display gives the font being used and will exclude control
characters. It needs two args -- a position and a char or number -- but
the position is ignored when there is an actual character.

Unlike char-displayable-p is it not documented so it may change or
vanish over time.

char-syntax returns ?w for word-like characters. This might do instead
on the [[:alpha:]] match. Thus

(let ((ch (random (max-char))))
(and (char-displayable-p ch) (eq (char-syntax ch) ?w)))

might be what you want though there will be a relatively low density of
matching characters. max-char is very big.

Another strategy is to select characters randomly from a string of
acceptable options.
Post by Alexandre Garreau
PS: is there a way to get something else than linear random distribution
with `random'? like normal law, or logarithmic distribution?
Yes, but I am running out of time! A cheap way to get an almost normal
distribution with mean n is to sum k numbers between 0 and n/k.

If you use the "select from a string" method, you can simply duplicate
those characters you want more of.
--
Ben.
Emanuel Berg
2018-12-06 14:29:58 UTC
Permalink
Here [1] is a little something that can be
interesting, who knows?

(defun scramble (beg end)
"Shuffle chars in region from BEG to END."
(interactive "r")
(when (use-region-p)
(save-excursion
(let*((str (region-to-string))
(chars (delete "" (split-string str "")))
(rand-chars (sort chars (lambda (a b) (zerop (random 2))))) )
(delete-region beg end)
(dolist (c rand-chars)
(insert c) )))))

[1] http://user.it.uu.se/~embe8573/emacs-init/sort-my.el
--
underground experts united
http://user.it.uu.se/~embe8573
Loading...