Open BEAGLE Note 1
Noted by Neio
I’m do some experiments with the method of Genetic Programming (GP). Open BEAGLE is the first choice that my advicer recommended to me to do the experiments because of it’s efficiency and some other quality. However, the references about Open BEAGLE are limited on internet. In this case, I would noted something useful here that was not mentioned on it’s official Wiki.
Here are the parameters to run the simulation (”HaRec” is the simulation program I have written):
usage: HaRec [-OBparameter=value] …
HaRec [-OBparam1=value1,param2=value2, ... ,paramN=valueN] …Supported parameters:
ec.conf.dump <String> (def: “”)
Filename used to dump the configuration. A configuration dump means that a configuration file is written with the evolver (including the composing operators) and the register (including the registered parameters and their default values). No evolution is conducted on a configuration dump.
An empty string means no dump.ec.conf.file <String> (def: “”)
The name of a configuration file containing evolver and parameter values.
A typical configuration file can be created with parameter “ec.conf.dump”.ec.hof.demesize <UInt> (def: 0)
Number of individuals kept in each deme’s hall-of-fame (best individuals so far). Note that a hall-of-fame contains only copies of the best individuals so far and is not used by the evolution process.ec.hof.vivasize <UInt> (def: 1)
Number of individuals kept in vivarium’s hall-of-fame (best individuals so far). Note that a hall-of-fame contains only copies of the best individuals so far and is not used by the evolution process.ec.init.seedsfile <String> (def: “”)
Name of file to use for seeding the evolution with crafted individual. An empty string means no seeding.ec.mig.interval <UInt> (def: 1)
Interval between each migration, in number of generations. An interval of 0 disables migration.ec.mig.size <UInt> (def: 5)
Number of individuals migrating between each deme, at a each migration.ec.pop.size <UIntArray> (def: 100)
Number of demes and size of each deme of the population. The format of an UIntArray is S1,S2,…,Sn, where Si is the ith value. The size of the UIntArray is the number of demes present in the vivarium, while each value of the vector is the size of the corresponding deme.ec.rand.seed <ULong> (def: 0)
Randomizer seed. A zero value means that the seed will be initialized using the current system time.ec.repro.prob <Float> (def: 0.1)
Probability that an individual is reproducted as is, without modification. This parameter is useful only in selection and initialization operators that are composing a breeder tree.ec.sel.tournsize <UInt> (def: 2)
Number of participants for tournament selection.ec.term.maxgen <UInt> (def: 50)
Maximum number of generations for the evolution.gp.cx.distrpb <Float> (def: 0.9)
Probability that a crossover point is a branch (node with sub-trees).
Value of 1.0 means that all crossover points are branches, and value of 0.0 means that all crossover points are leaves.gp.cx.indpb <Float> (def: 0.9)
Individual crossover probability at each generation.gp.init.maxargs <UIntArray> (def: 0/2)
Maximum number of arguments in GP tree. Tree arguments are is usually useful with ADFs (and similar stuff).gp.init.maxdepth <UInt> (def: 5)
Maximum depth for newly initialized trees.gp.init.maxtree <UInt> (def: 1)
Maximum number of GP tree in newly initialized individuals. More than one tree is usually useful with ADFs (and other ADx).gp.init.minargs <UIntArray> (def: 0/2)
Minimum number of arguments in GP tree. Tree arguments are is usually useful with ADFs a(nd similar stuff).gp.init.mindepth <UInt> (def: 2)
Minimum depth for newly initialized trees.gp.init.mintree <UInt> (def: 1)
Minimum number of GP tree in newly initialized individuals. More than one tree is usually useful with ADFs (and other ADx).gp.mutshrink.indpb <Float> (def: 0.05)
Shrink mutation probability for an individual. Shrink mutation consists in replacing a branch (a node with one or more arguments) with one of his child node. This erases the chosen node and the other child nodes.gp.mutsst.distrpb <Float> (def: 0.5)
Probability that a swap subtree is internal (the mutation occurs between three points, where the 2nd point is in the 1st point’s subtree, and the 3rd point is in the 2nd point’s subtree) vs being external (the mutation occurs between two points, where both points are not within the other’s
subtree). Value of 1.0 means that the swap subtrees mutations are all internal while value of 0.0 means that swap subtrees mutations are all external.gp.mutsst.indpb <Float> (def: 0.0)
Swap subtree mutation probability for an individual. A swap subtree mutation consists to swap two subtrees of a tree in an individual.gp.mutstd.indpb <Float> (def: 0.05)
Standard mutation probability for an individual. A standard mutation replaces a sub-tree with a randomly generated one.gp.mutstd.maxdepth <UInt> (def: 5)
Maximum depth for standard mutation. A standard mutation replaces a sub-tree with a randomly generated one.gp.mutswap.distrpb <Float> (def: 0.5)
Probability that a swap mutation point is a branch (node with sub-trees).
Value of 1.0 means that all swap mutation points are branches, and value of 0.0 means that all swap mutation points are leaves. Swap mutation consists in exchanging the primitive associated to a node by one having the same number of arguments.gp.mutswap.indpb <Float> (def: 0.05)
Swap mutation probability for an individual. Swap mutation consists in exchanging the primitive associated to a node by one having the same number of arguments.gp.tree.maxdepth <UInt> (def: 17)
Maximum allowed depth for the trees.gp.try <UInt> (def: 2)
Maximum number of attempts to modify a GP tree in a genetic operation. As there is topological constraints on GP trees (i.e. tree depth limit), it is often necessary to try a genetic operation several times.help
Shows the Open BEAGLE specific command-line usage and detailed parameter
descriptions.lg.console.level <UInt> (def: 2)
Log level used for console output generation. Log levels available are:
(0) no log, (1) basic logs, (2) stats, (3) general informations, (4) details on operations, (5) trace of the algorithms, (6) verbose, (7) debug (enabled only in full debug mode).lg.file.level <UInt> (def: 3)
Log level used for file output generation. Log levels available are: (0) no log, (1) basic logs, (2) stats, (3) general informations, (4) details on operations, (5) trace of the algorithms, (6) verbose, (7) debug (enabled only in full debug mode).lg.file.name <String> (def: “beagle.log”)
Filename in which messages are outputed. An empty string file name means no output is done to a file.lg.show.class <Bool> (def: 0)
Flag whether class name is outputed in the logs.lg.show.level <Bool> (def: 0)
Flag whether logging level in outputed in the logs.lg.show.type <Bool> (def: 0)
Flag whether message type is outputed in the logs.ms.restart.file <String> (def: “”)
Name of the milestone file from which the evolution should be restarted.
An empty string means no restart.ms.write.interval <UInt> (def: 0)
Milestone saving interval (in number of generations). When zero, only the last generation milestone is saved.ms.write.over <Bool> (def: 1)
If true, this flag indicates that old milestones should be over-written.
Otherwise, each milestone has a different suffix.ms.write.perdeme <Bool> (def: 0)
If true, this flag indicates that separate milestones should be written after each demes processing. Otherwise milestones are written after the processing of a complete populations.
ms.write.prefix <String> (def: “beagle”)
Prefix used to name the evolution milestone files. An empty string means no milestone.usage
Shows the Open BEAGLE specific command-line usage.
Mine information from a website
Neio, July 20, 2008, in Huaqiao Unversity
NOTICE: THIS ARTICLE IS ONLY CONCERNING THE WAY OF MINING INFORMATION FORM A WEBSITE. IF YOU ARE CONCERNING THE DATA OF IMDB, PLEASE READ THE COPYRIGHT OF IMDB.COM FIRST, AND DO NOT DO ANYTHING THAT AGAINST THE LICENCE UNLESS YOU ARE PERMITTED TO DO SO.
IMDB.com may be the largest movie database in our world. We can gain abundant information form this website. And if you are permitted, you can use the data to do some research. As I know, there are thousands of photos of stars, and some professors had been using these data to do some research on data mining or AI. Anyone can get information from IMDB in variable ways besides HTML website.
Here I discuss how to mine some data from IMDB (I just take IMDB for example). If you are more concerning the data of IMDB themselves than the way of mining data, you can visit http://www.imdb.com/interfaces instead.
Well, to better mine photos from IMDB, we should understand how to access photo data first.
- let’s visit URL of http://www.imdb.com/Sections/Gallery/Names/X (X is the prefix of the name, from A-Z, etc), then we can see the star list.
- If we click one of the names in this list, it jump into http://www.imdb.com/name/nm000000/mediaindex (000000 is the Id of stars), and we can see all of his/her photos if the number of the photo is less than 50. More photos was list in the URL like http://www.imdb.com/name/nm000000/mediaindex?page=2.
- If we visit http://www.imdb.com/name/nm000000/, we can see more information about this star like birthday, awards and alternative names.
- If we click one of the photo from http://www.imdb.com/name/nm000000/mediaindex, the URL redirect into http://www.imdb.com/media/rm111111111/nm000000, and the photo is there. In this URL, 111111111 is the id of the picture, and 000000 is the id of star. We can access http://www.imdb.com/media/rm111111111/ to get the same photo, but some feature would lost.
After understanding how to get the photos, we can analysis how to automatically download all photos you want (NOTICE: you should do this under the licence of IMDB).
- First, download /Sections/Gallery/Names/{0}, the {0} could be replace by one of below:
{"1","3","4", "5","A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q", "R","S","T","U","V","W","X",
"Y","Z","Aacute","Aring","Eacute","Ouml","Oslash","Uuml","THORN"}
In downloaded html, you can see something like this:
<LI><A HREF='/name/nm0045198/photogallery'>Baca, Shawna</A>
<LI><A HREF='/name/nm0000002/photogallery'>Bacall, Lauren</A>
<LI><A HREF='/name/nm0045209/photogallery'>Bacall, Michael</A>
<LI><A HREF='/name/nm1832162/photogallery'>Bacalski, Roberto</A>
<LI><A HREF='/name/nm0045214/photogallery'>Bacalso, Joanna</A>
<LI><A HREF='/name/nm0045219/photogallery'>Bacarella, Mike</A>
Then wen can write regular expression to get the ids and the names of the stars. The regular express can be written like:
@"HREF='(/name/nm(\d+)/photogallery)'>(.+)</A>"
-
Second, after you get all ids and names, you can access any star directly. Let’s download http://www.imdb.com/name/nm{0} (replace {0} by the id of star you want to download).
<h3>Overview</h3>
<div class="info">
<h5>Date of Birth:</h5>
<a href="/OnThisDay?day=27&month=May">27 May</a>
<a href="/BornInYear?1975">1975</a>,
<a href="/BornWhere?Atlanta,%20Georgia,%20USA">
Atlanta, Georgia, USA</a>
<a class="tn15more inline" href="bio">more</a>
</div>
<div class="info">
<h5>Trivia:</h5>
Part of the band <a href="/name/nm1642036/">Outkast</a>.
<a class="tn15more inline" href="bio">more</a>
</div>
Then we can use the regular express like :
"href=\"/OnThisDay\\?day=(\\d+)&month=(.+)\">(.+) href=\"/BornInYear\\?(\\d+)\">\\d+</a>"
to get his/her birthday and other regular express to get other information.
-
Third, Let get photo list form IMDB according the id by accessing /name/nm{0}/mediaindex?page={1} (please replace {0} with the id of star and replace {1} with the page number). And in fetched pages, we can see something as below:
<div class="thumb_list">
<a title="" href="/media/rm2901448448/nm0071275"><img alt="" height="100" width="100" src="http://ia.media-imdb.com/images/M/MV5BMjA4MjgzMTQyOV5BMl5BanBnXkFtZTcwNTE0OTA3MQ@@._V1._CR0,0,323,323_SS100_.jpg" /></a>
<a title="" href="/media/rm62166528/nm0071275"><img alt="" height="100" width="100" src="http://ia.media-imdb.com/images/M/MV5BMTc5NjEwNjE3MV5BMl5BanBnXkFtZTcwNDczMDE2MQ@@._V1._CR56,0,287,287_SS100_.jpg" /></a>
<a title="" href="/media/rm45389312/nm0071275"><img alt="" height="100" width="100" src="http://ia.media-imdb.com/images/M/MV5BMTI0MTc0MDQyNF5BMl5BanBnXkFtZTcwNTczMDE2MQ@@._V1._CR0,0,267,267_SS100_.jpg" /></a>
"href=\"(/media/rm(\\d+)/nm(\\d+))\">"
-
Forth, we can download the big photos by accessing the URLs fetched above. After we download the HTML of the URLs mentioned above, we can see the code of HTML that referring to the photo image, like below:
<center><table id="principal">
<tr><td valign="middle" align="center"><img oncontextmenu="return false;" galleryimg="no" onmousedown="return false;" onmousemove="return false;" src="http://ia.media-imdb.com/images/M/MV5BNDUwODE2OTQxMF5BMl5BanBnXkFtZTYwMTE0MzM3._V1._SX268_SY400_.jpg"></td></tr>
</table></center>
<div style="margin-bottom:0.25em;">
<b>Title:</b> <a href="/title/tt0417225/">Idlewild</a><br />
<b>Names:</b> <a href="/name/nm0071275/">André Benjamin</a>, <a href="/name/nm1745736/">Paula Patton</a><br />
</div>
<span class="less-emphasis">
<b>Photo 23 of 70:</b>
Photo date: 25 August 2006
</span>
<hr />
Then we can get photo image URL and photo information by regular expression:
"galleryimg=\"no\" onmousedown=\"return false;\" onmousemove=\"return false;\" src=\"(http://ia.media-imdb.com/images/M/(.+(\\.jpg)))\"></td>"
and
"Photo date: ((\\d{1,2}) (\\w{3,12}) (\\d{4}))"
- At last, you can really download the photos using the URL you have just fetched.
As discuss above, you can fetch photo and some information of stars. What’s more, you can use the same way to analysis the structure of the website and write regular expression you own.
However, I don’t want to discuss the detail about writing a program to download them automatically. I have just write one for experiment, using C# language with Window UI. You know, programming is easy for me.
If you are interesting in the data of IMDB for scientific purpose, and already gain the permission from IMDB to do so, you can contact me and I can send you one copy of my software. Or if you are interesting in programming, you can share with me, too.
(Neio, July 20, 2008)
I’m Still Alive
Yes, I’m still alive. It has been more than two months since my last update.
I’m preparing for GRE and busy with my classes. OS, OO, UML, AI ,etc. I have many things to do. To study in graduate school in US is still a dream at present.
But I’m trying…
自从上次用手机访问了自己的博客以后,就一直感觉Web 2.0的人性设计,同时也感受到自己上次改博客界面造成的一些鸡肋。于是乎一直有整改自己博客的想法。
现在自己博客似乎已经过于臃肿,离自己简单明了的风格想去甚远,以后渐渐去掉一些没用的View。Rating发现从来没有人去用过,决定先关掉或者改成很醒目的。决定自己设计一套WP皮肤。
这里先列举一下自己的想法:
1、简洁又要有深度
2、人性化设计,适合各种人群和设备
3、内容上多在文章下功夫,以前都写得比较少,以后多写点,而且要有深度。
4、排版上准备多用些插件,比如代码标亮等,排版要美观。
5、不记录太多无意义的事情。
6、默认页面只显示自己推荐的星级文章,其他随笔不在首页上显示,但可以在文章分类中找到。
7、迭代以上求精或添加新的指导原则。
最近没空,所以就先放在这里提醒自己了。
ASP.NET 点滴一
最近又开始写些.NET的程序,从现在起开始吧遇到的一些问题和心得记录下来。
Problem.1
"sql_variant 到 uniqueidentifier 的隐式转换。请使用 CONVERT …."
在ASP.NET 中使用DataSource的时候如果数据库使用的uniqueidentifier,自动绑定所编译的程序运行时会有这个错误,因为其自动生成的类型是Object。有的人总想方设法吧类型设置成Guid的,其实只需要将类型设置为Empty即可。
例如我自己写程序的时候遇到的这个问题:
原来是:
<asp:ControlParameter ControlID="GridView1" Name="tgc_tgid" PropertyName="SelectedValue" Type="Object" />
<asp:ControlParameter ControlID="ddlRoles" Name="tgc_role" PropertyName="SelectedValue" Type="Int32" />
只需要将Type="Object"去掉即可
Note.2
开启Trace






