Configuration

From Apibot
Jump to: navigation, search

To work, Apibot needs a wiki login, and optionally bot settings.

Wiki login

In practically all wikis, good manners (and quite often rules) require a bot to operate through an account. Anonymous bot usage is usually prohibited. For this reason, Apibot requires a login to a wiki account to operate. If the account does not exist at this wiki, Apibot will refuse to work.

(You can circumvent this by specifying "ANONYMOUS" (with capital letters) as username, but do not do it unless you know really well what you do.)

A wiki login is a PHP array variable with three mandatory fields, and possibly others. For example:

$my_bot_login = array (
  'user'     => "My Great Bot",
  'password' => "1234mybotpassword5678",
  'wiki'     => $Wikipedia_EN,
);

The 'user' and 'password' values are self-explanatory strings. The 'wiki' value is a wiki description variable (see below).

The 'wiki' value does not need to be defined as a separate variable - you can insert the wiki description right there. However, if you use more than one account for your bot at this wiki, defining the wiki desc as a variable might avoid unnecessary redundancy.

Often a bot will be used on multiple wikis. Sometimes a bot will perform several different tasks on the same wiki, and in some cases the wiki rules will require it to use different accounts for the different tasks. In such cases, the bot operator will need to define several different login variables. Usually (but not mandatory), they are defined as values in an array, typically named $logins:

$logins = array (
  'MeAtWEN' => array (
    'user'     => "Joe Editor",
    'password' => "pwdofjoeeditor",
    'wiki'     => $Wikipedia_EN,
  ),
  'MyBotAtWEN' => array (
    'user'     => "Joe's Bot",
    'password' => "pwdofjoesbot",
    'wiki'     => $Wikipedia_EN,
  )
  'MyBotAtWDE' => array (
    'user'     => "Joe's Bot",
    'password' => "pwdofjoesbot",
    'wiki'     => $Wikipedia_DE,
  )
);

Wiki description

As we saw, a wiki login needs a wiki description. It is a PHP array variable with the following structure:

$Wikipedia_EN = array (
  'name' => 'en',
  'urls' => array (
    'api' => "http://en.wikipedia.org/wiki/api.php",
    'web' => "http://en.wikipedia.org/wiki/index.php",
  ),
)

The 'name' value contains a name for this wiki that Apibot will use while referring to it, while caching data about it etc. For this reason, this field is mandatory.

The 'urls' array values are the URLs Apibot needs to know in order to contact the wiki. The 'api' url is used to exchange info with the MediaWiki API (bot) interface, and the 'web' url is used to exchange info with the MediaWiki Web (human) interface. (Apibot can work through different interfaces - potentially even with different wiki softwares.)

The wiki description may contain also other fields, for example for bot settings (see below).

Often a bot will be used on different wikis, thus needing more than one wiki description variables. Usually (but not mandatory), they are defined as values in an array, typically named $wikis:

$wikis = array (
  'en' => array (
    'name' => 'en',
    'urls' => array (
      'api' => "http://en.wikipedia.org/wiki/api.php",
      'web' => "http://en.wikipedia.org/wiki/index.php",
    ),
  'de' => array (
    'name' => 'en',
    'urls' => array (
      'api' => "http://de.wikipedia.org/wiki/api.php",
      'web' => "http://de.wikipedia.org/wiki/index.php",
    ),
  ),
)

Settings

The work of Apibot can be tuned to a large degree through giving its parameters some settings. These are PHP arrays, gathered in a common settings array or inserted in the wiki or login descriptions.

Usually most of the settings you make are good for most or all of your bot work. However, sometimes you need some settings to be different for one or more of the wikis you work on. And sometimes you might need some settings to differ even for the different accounts in the same wiki. That is why you can insert settings not only in a global array, but in wiki or login descriptions too. Settings in wiki descriptions override these in a settings array. Settings in login descriptions override these in wiki descriptions.

The settings are distinguished by their array keys in the array they are inserted in. For example, the Browser module settings should always be inserted under the key 'browser'.

None of these descriptions or the values in them is mandatory - define only these you need.

Common settings

These settings affect the work of the bot as a whole. A small part of them are used only by specific modules.

General settings

'environment' => array (
  'timezone'     => "UTC",   // Your time zone. You need it for the correct work of the bot.
  'memory_limit' => "128M",  // The max amount of RAM you would permit to the bot.
),

PHP defines a limit to the amount of RAM it will allow a script (eg. Apibot) to use. By default, this limit is rather low - about 16M. This might be not enough for memory-intensive bot tasks. At the same time, on a RAM-starved hardware you might want to limit what you'd let to the bot in order to preserve more important tasks there.

Parameters handling

'setparams' => array (
  'lax_mode' => false,  // Determines the way the Apibot modules will process the parameters they are given
),

For those who enjoy tinkering and know very well what they do. If you are not one of them, best don't touch this.

Paths

'paths' => array (
  'info'     => dirname ( __FILE__ ) . "/data",      // Cached info files will be stored there.
  'identity' => dirname ( __FILE__ ) . "/identity",  // Cached info files will be stored there. 
),

Paths that Apibot uses. More can be defined in future versions.

For 'info' and 'identity', use "" (empty string) to put the files into the current directory, and NULL to disable writing them. (They are considered obsolete in favor of the 'info_path' and 'identity_path' settings of the Infostore module, who supersede them. They will be abandoned in some future version of Apibot.)

Backend-independent modules

Some Apibot modules are backend-independent. Usually Apibot will have only one of them active at any given time. Here are the settings for them.

Browser module

'browser' => array (
  'agent'        => "Mozilla/5.0 (Apibot Browser)",  // HTTP User-Agent
  'http_version' => "HTTP/1.1",  // Report this version
  'http_user'    => NULL,  // HTTP (not MediaWiki!) authentication user (if any)
  'http_pass'    => NULL,  // HTTP authentication password
  'conn_timeout' => 120,   // Connection timeout
  'max_get_len'  => 2048,  // If a GET string will exceed this length, use POST instead
  'content_type' => array (  // Use these types. (Currently GETs are considered text, and POSTs binary.)
    'text'   => "application/x-www-form-urlencoded",
    'binary' => "multipart/form-data",
  ),
  'mime_boundary'   => "Apibot-Browser-$1", // for multipart/form-data ($1 is replaced by a random string)
  'use_compression' => true,  // Compress transferred data where possible
  'speed_limits'    => array (  // Transfer speed (average) limits, bytes / sec
    'total' => PHP_INT_MAX,  // Both upload and download
    'DL'    => PHP_INT_MAX,  // Download only
    'UL'    => PHP_INT_MAX,  // Upload only
  ),
  'dump_level'      => 0,  // Debugging tool. Dump to stdout transferred data: 0 - none, 1 - all data

),

Most values are explained well in the comments. Here is some additional info:

In the technical specifications there is no set limit for the maximal length of a GET request. However, no software can support unlimited size strings, and some Apibot requests can generate longish GETs. Almost no web server will break if served 2048 bytes long GET request, so it seems reasonable to use this as an upper limit for a GET request size. Longer requests will be sent as POST requests, whose size can reach megabytes.

If PHP is installed on your PC with the zlib component (all modern PHP installations include it), Apibot will be able to exchange info with MediaWiki in compressed form. This saves a lot of Internet traffic both to you and the wiki (if it supports data compression too - most wikis do). Given the amount of exchange a typical bot does, the difference in the line clogging and the traffic expenses for you and/or the wiki might be significant.

If you want Apibot to not saturate your Internet line and block you from browsing, you might like to limit its speed. Or if your line is decently fast, it is the wiki that might be saturated, and this is not nice too. Some wikis, eg. Wikipedia, have powerful lines and load balancers, and can easily handle everything an individual can throw at them. Smaller wikis, however, might not be able to. In such cases, set these speed limits.

(The exchange with some wikis might be slow even if both your and the wiki's line are good, and the wiki is not overloaded. The reason usually is that MediaWiki is not very fast and needs a lot of resources to create the reply of a request. (Often it is not the wiki software itself that is slow, but the MySQL that it uses for a database.)

Log module

'log' => array (
  'loglevel' => LL_DEBUG,  // LL_PANIC, LL_ERROR, LL_WARNING, LL_INFO, LL_DEBUG - log from least to most
  'logfile'  => "test.log",  // log filename (if empty, no logfile will be written)
  'echo_log' => true,   // echo log to the stdout (eg. screen) too?
  'html_log' => false,  // log messages as HTML (for reading in a browser)?
  'levelprefs' => array (  // mark the loglines with these characters according to the line loglevel
    LL_PANIC   => '!',  // log entries with level LL_PANIC will start with this character, etc
    LL_ERROR   => '#',
    LL_WARNING => '=',
    LL_INFO    => '+',
    LL_DEBUG   => '-'
  ),
),

Infostore module

'paths' => array (
  'info_path'     => dirname ( __FILE__ ) . "/_info",      // Cached info files will be stored there.
  'identity_path' => dirname ( __FILE__ ) . "/_identity",  // Cached identity files will be stored there. 
),

For 'info_path' and 'identity_path' use "" (empty string) to put the files of this type into the current directory, and NULL to disable writing this type of files at all.

Backend-specific modules

Apibot uses different backends to connect to different wiki interfaces, or potentially to different wikis. To decrease complexity somewhat, backends have almost the same structure, with similar-named modules that play the same role in every backend.

However, the modules with the same name and role may differ between the different backends. Because of this, they might have different settings and defaults. In addition, you might need to set different values for the same setting in different backends. This is possible by inserting in a settings (sub-)array sub-arrays with keys named for the (lowercased) names of the backends. The values in these backend-specific sub-arrays are valid for the appropriate backend only, and will override for it the values for the same settings in the array the backend-specific sub-array is inserted in. Here is an example for configuring an exchanger module:

'exchanger' => array (
  'settings' => array (
    'max_retries' => 5,
    'api' => array (
      'max_retries' => 10,
    ),
    'web' => array (
      'max_retries' => 3,
    ),
  ),
  'defaults' => array (
    'api' => array (
      'format'  => "json",
      'maxlag'  => 5,
      'maxage'  => NULL,
      'smaxage' => NULL,
    ),
    'web' => array (
    ),
  ),
),

By default, max_retries is set to 5. However, for the API backend it will be set to 10, and for the Web backend it will be set to 3. (Imagine that Apibot has also other backends - for these max_retries will be the common 5.)

As for the defaults, there are none that are common for the backends. Only the API backend has settings format, maxlag, maxage and smaxage - the Web backend doesn't have them. And the Web backend might have defaults that the API backend doesn't have.

Exchanger module

'exchanger' => array (
  'settings' => array (  // this module has both settings and defaults 
    'api' => array (  // every setting here can be common or backend-specific
      'max_retries' => 5,  // retry on request failure up to this number of times
      'retry_wait'  => 1,  // determines the wait after each retry
    ),
    'web' => array (
    ),
    'dump_level' => 0,
  ),
  'defaults' => array (  // For the exchanger parameters - set after reading MW docs
    'api' => array (
      'format'  => "json",  // preferred over PHP, if the MW version and the local PHP supports it
      'maxlag'  => 5,
      'maxage'  => NULL,
      'smaxage' => NULL,
    ),
  ),
),

Here you can tune not only the settings, but also the defaults for the exchanger parameters. (These are the parameters needed for every request to the wiki. They might be different for the different backends - consult with the MediaWiki docs what parameters are supported by the appropriate version.)

The 'retry_wait' value determines the delay on exchange error. If it is just a numeric, it will be multiplied by the retry number squared, to give the wait in seconds before the next attempt. For example, if the value is 2 and this is the third retry attempt, Apibot will wait 2 * 3 * 3 = 18 seconds before the forth attempt.

If the value is array, it will be expected to contain retry numbers as keys and wait seconds for them as values. If there is no value for the current retry number, the wait seconds will set to the retry number squared.

Identity module

'identity' => array (
  'always_login'   => false,  // true - always login, false - use cookies where available
  'login_attempts' => 5,   // try to login up to that many times before giving up
),

This is the module that logs the bot in or out of its account.

While logging in, the bot can either go the full procedure, or to simply store the identity cookies set by a previous login. Going through the full procedure is the safest option, but means a little more delay or traffic. Some tasks, eg. maintaining interwikis, mean constantly logging in/out of different wikis - in such cases, the full logging overhead might be significant - for those it might be reasonable to set always_login to false. For long tasks with single login, eg. checking an entire wiki for popular spelling errors, you might like to set always_login to true.

Info module

'info' => array (
  'infotypes' => array (
    'general' => array (  // general info settings
      'fetch' => "always",  // always, never, if_unknown, if_older_than ('days'), on_newversion, on_newrevision
    ),
    'site' => array (  // siteinfo settings
      'fetch' => "on_newrevision",
    ),
    'param' => array (  // paraminfo settings
      'fetch' => "on_newrevision",
    ),
    'user' => array (  // userinfo settings
      'fetch' => "on_newrevision",
    ),
    'allmessages' => array (  // allmessages info settings
      'fetch' => "on_newrevision",
    ),
    'filerepoinfo' => array (  // file repo info settings
      'fetch' => "on_newrevision",
    ),
    'globaluser' => array (  // globaluser info settings
      'fetch' => "on_newrevision",
    ),
  ),
),

The Info module holds, caches and serves the information about the MediaWiki site. It is a huge wealth of info, a lot of it very useful to a bot.

There are different types of info defined. For every type there is a policy - when to fetch it, and when to save (cache) it.

The volume of the info a MediaWiki supplies is big: it is best to not transfer it every time. In some cases, however, transferring it might be desirable. Also, cached info might become obsolete after a software upgrade at the wiki. For this reason, you can set for every type of info in which cases to transfer it:

  • "always" and "never" are self-explanatory
  • "if_unknown" (synonym: "if_missing") will transfer only if this info type is not found in the cache.
  • "if_older_than" will transfer if the cached info is older than N days. (The number of days must be set in the same sub-array as the 'fetch', with the key 'days'. If not set, a default of 30 days will be used.)
  • "on_newversion": if the reported version of MediaWiki is changed from the last check
  • "on_newrevision": if the reported revision of MediaWiki is changed from the last check. (Since a lot of versions, MediaWiki does not report its revision. If a revision is not reported, the info will be fetched on a version change. I will be glad, however, if the MediaWiki programmers return that feature, it is really useful for the bot. A lot of significant changes happen on a revision change.)

Actions

The Apibot core is made of several layers. One of these is called Actions. There you can set defaults for most parameters you can give to the different actions the bot can take. Some of the actions might have also settings that determine their work.

The action 'query' is used in the Bridge interface by the Query objects returned from the Bridge Query functions. In the Assembly line interface, it is used by the Fetcher_* objects and the Feeder_Query_* functions.

The other actions are used in the Bridge interface by the non-query functions. In the Assembly line interface, it is used by the Writer_Wiki_* objects.

The Actions can be configured in the 'actions' sub-array. It has sub-arrays named after each action - 'block', 'edit' etc. Each of these can have under the key 'settings' a sub-array for the settings of the Action, and under the key 'defaults' a sub-array for the defaults of the Action parameters. Like with all other backend-specific modules, backend-specific sub-arrays are supported.

The names of the actions, as well as these of most parameters match the names of the MediaWiki API actions and their parameters. Read more about these parameters in the MediaWiki API description.

'actions' => array (
  'block' => array (
    'defaults' => array (
      'anononly'  => true,
      'nocreate'  => true,
      'autoblock' => true,
      'noemail'   => true,
      'expiry'    => "never",
      'reason'    => NULL,
    ),
  ),
  'delete' => array (
    'defaults' => array (
      'watch'  => NULL,
      'reason' => NULL,
    ),
  ),
  'edit' => array (
    'defaults' => array (
      'text'         => NULL,
      'prependtext'  => NULL,
      'appendtext'   => NULL,
      'section'      => NULL,
      'sectiontitle' => NULL,
      'undo'         => NULL,
      'undoafter'    => NULL,
      'minor'        => true,
      'bot'          => true,
      'recreate'     => NULL,
      'createonly'   => NULL,
      'nocreate'     => NULL,
      'redirect'     => false,
      'watchlist'    => NULL,
      'summary'      => NULL,
    ),
  ),
  'emailuser' => array (
    'defaults' => array (
      'user'    => NULL,
      'text'    => NULL,
      'subject' => NULL,
      'ccme'    => false,
    ),
  ),
  'expandtemplates' => array (
    'defaults' => array (
      'text'  => NULL,
      'title' => NULL,
    ),
  ),
  'help' => array (
  ),
  'import' => array (
    'defaults' => array (
      'interwikisource' => NULL,
      'templates'       => false,
      'fullhistory'     => NULL,
      'namespace'       => NULL,
      'summary'         => NULL,
    ),
  ),
  'login' => array (
  ),
  'logout' => array (
  ),
  'move' => array (
    'defaults' => array (
      'movetalk'     => true,
      'movesubpages' => true,
      'noredirect'   => NULL,
      'watch'        => NULL,
      'reason'       => NULL,
    ),
  ),
  'paraminfo' => array (
  ),
  'parse' => array (
    'defaults' => array (
      'text'    => NULL,
      'title'   => NULL,
      'prop'    => NULL,
      'pst'     => NULL,
      'uselang' => NULL,
    ),
  ),
  'patrol' => array (
  ),
  'protect' => array (
    'defaults' => array (
      'protections' => array (
        'edit'  => "sysop",
        'move'  => "sysop",
      ),
      'expiry'  => array (
        "never",
        "never",
      ),
      'cascade' => true,
      'reason'  => NULL,
    ),
  ),
  'purge' => array (
  ),
  'query' => array (
    'defaults' => array (
      'redirect'      => NULL,
      'indexpageids'  => false,
      'export'        => false,
      'exportnowrap'  => false,
      'converttitles' => false,
      'iwurl'         => false,
    ),
  ),
  'rollback' => array (
    'defaults' => array (
      'title'   => NULL,
      'user'    => NULL,
      'summary' => NULL,
      'markbot' => true,
    ),
  ),
  'unblock' => array (
    'defaults' => array (
      'reason' => NULL,
    ),
  ),
  'undelete' => array (
    'defaults' => array (
      'title'  => NULL,
      'reason' => NULL,
    ),
  ),
  'upload' => array (
    'defaults' => array (
      'comment'        => NULL,
      'text'           => NULL,
      'watch'          => NULL,
      'ignorewarnings' => false,
    ),
  ),
  'userrights' => array (
    'defaults' => array (
      'add'    => NULL,
      'remove' => NULL,
      'reason' => NULL,
    ),
  ),
  'watch' => array (
    'defaults' => array (
      'unwatch' => NULL,
    ),
  ),
),

Tasks

The top layer of the Apibot core consists of two groups of objects - Queries and Tasks. These are backend-independent (will check and try the existing backends in the specified order). Both interfaces of Apibot are built on these objects.

The 'backends' key specifies an array of backends names, in the order the backends should be tried. It can be set for all tasks or for a specific task (overrides the setting for all tasks). If only one backend should be tried, it may be specified as a name instead of as an array with a single name in it.

Most tasks have no other settings or defaults.

'tasks' => array (
  'backends' => array ( "api", "web" ),  // which backends to use and in what order
    'fetch_editable' => array (
      'fetch_objects' => true,  // whether to fetch pages as objects (true) or arrays (false)
    ),
    'fetch_title' => array (
    ),
  ),
),

Typical config files layout

Typically (for example, in a downloaded version of Apibot) all wikis are defined in a file called wikis.php, all logins - in a file called logins.php (which includes wikis.php), and all settings - in a file called settings.php. By default, all three files are in the root directory of Apibot.

Links