ویکی‌پدیا:سیاست ربات‌رانی/درخواست مجوز/انگلیسی/MerlIwBot/Task 1

Operator: Merlissimo (بحثمشارکت‌هاانتقال‌هابسته شودبسته‌شدن‌هااختیاراتآمارآمار پیشرفته) (homewiki: dewiki)

Time filed: جمعه،‏ ۲۶ آوریل ۲۰۲۴ (میلادی)، ساعت ‏۰۰:۵۱

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java

Source code available: No

Function overview: global bot tasks

  • adds/modifies/removes interwikis on all namespaces except user root pages
    uses the algorithm of tarjan to find strong connected components which is the base for all further modification decisions
    wikidata compatible (already adds items with sitelinks on wikidata test repository)
  • may fix double redirects in future (currently not done because xqbot already uses by database results)

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: min edit delay 8 seconds for a wiki, but that highly depends on s7 database cluster load which is continuous monitored by the bot

Exclusion compliant (Y/N): yes for additions and modifications, but can always remove langlinks to not existing targets if not namely denied

Already has a bot flag (Y/N): global and local flag, so that has bot right on _all_ wikipedia related projects (this includes incubator, commons and wikispecies)

Function details: More than a year ago i have extended my java framework (you know from my other bot MerlLinkBot) to handle interwikis. The main reason why i wrote this is to solve interwiki problems that cannot be handled by the pywikipediabot framework. It is save to run this framework on all namespaces (e.g. can handle interwikis included from a subpage). The algorithm runs stable for one year now.

The bot runs on the toolserver where it analyses the database to find pages needing interwiki update. It is optimized to process a large number of pages with limited memory usage and very, very low cpu usage. (The bot makes about 20000-60000 edits a day on all wikis.) The run itself is done only on live data using the api interface. Logs for every edit are available at tools:merliwbot/editinfo.

The bot scans each wiki once a month. For example on the last run for this wiki the bot found among others 436 langlinks on 396 pages (mainly templates) to not existing targets that could not be removed because of edit protection using sysop level.

As special feature my bot adds a reason for each removal on summary (missing, deleted or connected to another article). Since some weeks the summary also includes the old langlink on modifications if this fits into the maximum summary length of 255 bytes.

Discussion ویرایش

I am using my global bot flag for running on this wiki since april 2011. In between the bot has done more than 65000 edits. Now جواد requested on my homewiki talk page to additionally request a local fawiki botflag (btw: since some days my bot also runs on fawikiquote also using its global flag). There was an errornous edit, but according to my logs the api returned no langlinks for this pages. Based on this info the addition was logical correct, because the redirect fa:ماریانو آکوستا was linked from these added eswiki and itwiki pages about the person. I don't know why the api returned wrong informations. I saw this only one time before (several month ago) when cdb files containing sitematrix were wrongly rebuild by wmf.

My bot uses the same translatewiki translations as the pwb bot. According to جواد it seems to be a problem that my bot does not add any version info as prefix for the summary. xqt implemented this because of a python bug for pwb, but Java does not have that problem because all java versions are using utf-16 code units internally. So a unicode bug is impossible. All title normalisations are done by mediawiki, not by my framework. Merlissimo (بحث) ‏۲۸ ژوئیهٔ ۲۰۱۲، ساعت ۰۰:۵۱ (UTC)[پاسخ]

your bot had global flag for standard interwiki code not your own code! it has two important bugs (for fa.wiki also for other wikis)
  1. it works on pages that have delete tag ({{حذف سریع}} in my opinion you didn't defined other wikis delet tags also redirect category tag {{رده بهتر}})
  2. it has the bug that you mentioned
we block it on fa.wiki till it is solved .after solving this bug please send me a message رضا ۱۶۱۵ / ب ‏۳۰ ژوئیهٔ ۲۰۱۲، ساعت ۱۵:۰۳ (UTC)[پاسخ]
I diveded my long answer into four parts for better readability.
Global bot:
Global bot policy defines two tasks that can be done by bots being member of global bot group: maintain interlanguage links or fix double-redirects. There is no implementation defined that must be used nor a reference implementation is mentioned. On my request for membership of global bot group i wrote that i will use my own java framework. Before starting editing here i also read every local bot policy. Your local bot policy accepts global bots without any restrictions. ویکی‌پدیا:سیاست ربات‌رانی/درخواست مجوز: "This wiki uses the standard bot policy, and allows global bots [...]"
If you don't accepted global interwiki bots not using the pwd script, please add a note to your local policy so that other bot operators know about it.
enwiki block:
I was blocked on enwiki after a manual edit. Some members of the bot approval group told my on irc that i was correct, but they cannot unblock the bot which can only by done by single admin decission. I should rerequest bot flag again, so that a bureaucrat can unblock my bot after they have approved it a second time. At that time i was building my big database that is used by my bot now. This took about four month last year and the block happend during that time. So i decided to exclude enwiki instead of stopping my bot globally for some weeks. If i would add enwiki now, i have to make a complete scan of enwiki first which stopps my bot for approx. 6-8 weeks globally. That's caused by my bot design. I have never investigated in unblocking my bot on enwiki. For supervised edits i have another approved bot on enwiki that i can use for fixing conflicts manually.
I think on this wiki you have enough experience with bots, so that you can make your own decision. I have very much experience with interwiki bots because i am approving interwikibots on dewiki since years now. That was also the reason why I was elected to be a bureaucrat from the dewiki community.
config:
My bot is based on the interwiki use case that was created when the interwiki extension was developed: meta:Interlanguage_use_case.
According to this all langlink should be bidrektional. That also means that two articles on one wikipedia should not link to the same target. This situation is called "interwiki conflict". Technical it is currently possible to do this. WikiData uses the same use case and will deny that one article is connected to two items. Having two langlinks to the same wiki on one article is currently forbidden by the database primary key, but the renderer shows both langlinks. The api returns the database content. The admin on enwiki blocked me because he wanted that my bot accepts the same langlink targets on two enwiki articles.
My bot reads all config from this live wiki. The list of disambigours pages is read from MediaWiki:Disambiguationspage and the interwiki sort order from MediaWiki:Interwiki config-sorting order as described in meta:Interwiki_sort_order (which was my proposal). So local admins can easily change the configurations any time and my bot will respect it within few hours (config is read on every startup und cached for some hours). (For the pwd interwiki bot you have to create a bug as sourceforge and wait until sb. implements the change.)
My bot is not allowed to edit user root pages and pages containing {{bots|deny=MerlIwBot}}. Additionally it will not add or modify langlink to pages containing {{nobots}}. I added the expection that it can remove langlink on pages with nobots because in the past this template was added to many pages where the pwd interwikibot behaves wrong:
  • pwd bot does not respect a different langlink position of langlinks without explicit config. pwd always reorders the langlinks and adds them to the bottom - my bot does not do that.
  • pwd adds langlinks to pages although they are already included from a subpage - my bot detects langlinks by using the api only. So my script knows which langlinks are already shown at the sidebar. If there are existing langlinks new langlinks MerlIwBot adds them relative to thes existing one.
  • removing langlinks only is an invariant that always terminates, so there can be no bot wars
There is no definition on your bot policy that my bot should not edit pages containing some special templates. There are two solutions that are possible for my bot:
  1. We make a proposal on meta that interwiki bots respect a list of templates that is linked on a local config page like this is already done with disambigous templates
  2. My bot also respects the {{nobots}} template if this is only included in a page by another template. So adding this template to the includepart of template:حذف سریع would easy solve the problem without any change on my code. And you can easily add this to more templates in future if needed.
Adding this to my code directly is not a good solution because then you have to inform me every time when this changes on this wiki.
algorithm for changes:
As i already wrote my bot creates strong connected components. An easy example with three articles each having one langlink:
  • fa:A->de:A
  • de:A->fa:A
  • de:B->fa:A
There is a strong connection between fa:A and de:A because both articles link to each other. That's the reason why my bot will remove to fa-langlink on de:B which is only undirectional.
This can solve many easy interwiki conflicts that happen because of local moves. Of course the error rate is higher compared to the pwd algorithm which stops on every conflict. But according to my experience it is still very very low. And not fixing those interwiki conflicts isn't a better solution. My bot only removes langlink targets from an article that either do not exist or are also linked from another article on this wiki. Because my bot adds an removal reason everybody is able to relate to the change. Additional the logs are public available for 6 months and i have additionally very detailed log for the last month. Merlissimo (بحث) ‏۳۰ ژوئیهٔ ۲۰۱۲، ساعت ۱۷:۴۲ (UTC)[پاسخ]
thank you for your long answer :)
some items should be consider
1-please define ignoreTemplates variable like iterwiki.py because in some cases your bot remove interwiki wrongly
for example category a is candidate for delete and users adds its interwikis to new category (which is correct name) your bot removes interwikis form new one and admins deletes the old one so the category will be un-interwiki!
adding ignoreTemplates can be done by listing them in locale wikis like MediaWiki:Disambiguationspage
2-in some cases interwikis' relationships are like w!
de:A>fa:B
fa:B>de:C
what is the reaction of your bot for these cases? ویکی‌پدیا:فهرست مقاله‌هایی دارای میان ویکی ضربدری رضا ۱۶۱۵ / ب ‏۲ اوت ۲۰۱۲، ساعت ۰۱:۰۷ (UTC)[پاسخ]
For example
For the exmaple:
  1. if there are only these two langlinks you mentioned:
    my bot won't do anything because of the conflict (the algorithm would find three strong connection groups each containg only one article; the group merge algorithem (in practive the same as in pwd) would fails because of the conflict). The advantage of my bot againt pwb is that it doen't matter at which article its starts because of its reverselanglink search feature. (pwd starting at de:Bergsturz would not recognize the incoming langlink which causes the conflict and adds the de-langling to en:Sturzstrom).
  2. if there are additional langlinks that create a backpath my bot could solve this conflict:
    Now there is a strong connection between en:Rockslide and de:Bergsturz because path en:Rockslide->:de:Bergsturz and backpath de:Bergsturz->lt:Akmenų_nuoslanka->en:Rockslide
    secondly this strong connection group does only contain one dewiki and one enwiki article. In this case the bot would replace the enwiki langlink en:Sturzstrom with en:Rockslide on de:Bergsturz
  3. To make it more complicated:
    If de:Bergsturz on the second example could contain an additional langlang to another wiki (e.g. frwiki) that is not part of the strong connection group (which only contains enwiki, dewiki and ltwiki) this fr-langlink would not be touched by my bot in autonomous mode. This is because of the rule, that a langlink can only be removed if ether the target does not exist or it is connected from another article on that wiki. So the conflict would only be solved partly by my bot. (of course the bot also checks if the strong connection group around the frwiki article (this group my contains only one article) could be merged into the en/de/lt strong connection group without conflicts - that's in pratice the same as pwd does. In this case the fr langlink would be added to enwiki and ltwiki)
I still would recommend to let these template inlude nobots in the includeonly part because it's much more flexible for you. Both templates mentioned in interwiki.py are protected, so that i cannot change them. But using a config page would be also fine. In this case i would suggest to use e.g. MediaWiki:Interwiki config-ignore templates and add links to templates like in MediaWiki:Disambiguationspage.
@جواد: Sorry that my answer here was a bit jocky. But of course i checked and fixed this conflict after your notice friday tools:merliwbot/editinfo/uuid/cf113d80-d837-11e1-bd00-0019b9dd4f70 [۱] [۲]. And i did not realise that you want me to stop my bot during this discussion (i stopped merliwbot for autonomous edits on fawiki after your block on monday)
Merlissimo (بحث) ‏۲ اوت ۲۰۱۲، ساعت ۱۳:۰۱ (UTC)[پاسخ]
I added template:nobot to that templates also I made mediawiki page.
✓ your bot has flag for editing interwikis with your java codeرضا ۱۶۱۵ / ب ‏۲ اوت ۲۰۱۲، ساعت ۱۷:۰۱ (UTC)[پاسخ]

توضیح: درود، من چند مورد اشتباه از این ربات دیده بودم، مثلا افزودن میان ویکی در صفحاتی که برچسب حذف دارن یا این ویرایش، به کاربر که گفتم، ایشان فرمودند که از کد میان ویکی pywikipedia استفاده نمی‌کنند و از کد جاوا استفاده می‌کنند، به همین خاطر گفتم که درخواست محلی بدهد تا اعضای گمر بررسی کنند و در صورت صلاحیت ادامه دهد. ضمن اینکه رباتشان به همین دلیل در ویکی انگلیسی بسته شده و الان بسته می‌باشد. ارادتمند جواد|بحث ۷ مرداد ۱۳۹۱، ساعت ۱۰:۲۴ (ایران)

این کاربر نیازمند کمک و توجه یکی از اعضای گمر است. لطفاً پس از پاسخ دادن به درخواست وی، این الگو را بردارید.

اعضای محترم گمر لطفا این درخواست را جواب دهید. ربات ایشان هم چنان خرابکاری می‌کند. مثلا اینجا به رده‌ای که برچسب حذف دارد رده افزوده (رده منتقل شده‌است) و در اینجا همه میان ویکی‌ها را از رده جدید حذف کرده و اگر رده قدیم توسط مدیران حذف شود، رده جدید بدون میان ویکی خواهد ماند. علتش هم به نظرم این باشد که با توجه به این که از کد pywikipedia استفاده نمی‌کند برای زبان‌های دیگر (مثل فارسی) برچسب‌ها را تعریف نکرده و ربات نمی‌شناسد. تا بررسی اعضای گمر، با توجه به تذکری که قبلا در بحث کاربر در ویکی آلمانی داده‌ام و گفته‌ام تا نگرفتن پرچم ربات را نراند، ربات را فعلا می‌بندم. جواد|بحث (۹ مرداد ۱۳۹۱) ‏۳۰ ژوئیهٔ ۲۰۱۲، ساعت ۱۴:۴۷ (UTC)[پاسخ]