Bot internal structure

From Apibot
Jump to: navigation, search

The basic components of Apibot are:

  • Apibot core: performs the basic low-level functions, akin to the kernel of an OS
  • Tasks: objects that perform different operations on the wiki and the data it contains
  • Queries: objects that fetch different types of information from the wiki
  • Bridge: an interface that exports functions to obtain Queries objects or perform tasks (using Tasks objects as backends)
  • Assembly line: an interface that consists of a big number of specialized objects, who can be strung up like Unix commands, and use Queries and Tasks objects as backends.

Core structure

The Apibot core is an object of the Core class, together with the objects it envelops as its properties.

It provides authenticating with the wiki, exchanging data with it, fetching and storing service info about the wiki and the user account the bot uses, etc.

Serves as a backend to two categories of objects - Tasks and Queries - that provide the different functionalities of the bot. (These in turn are used as backends by the Apibot interfaces.)

Core object structure

Contains (as properties) objects from backend-independent classes.

Browser object

Implemented by the Browser class.

Implements doing HTTP and HTTPS transfers. Derived from the Browser object in Bgbot, by Borislav Manolov, and ultimately from the Peachy project.

Offers support for cookies, which is used to provide wiki account identity storing and reusing.

Supports transfer timing, and uses this to provide the ability to limit the average maximal communication speed - total, per download and per upload.

Used by the Exchanger objects as a backend for the wiki communication. Supplied to them on creating them.

Log object

Implemented by the Log class.

Implements logging in a log file and/or on screen, in simple text or HTML format.

Has support for different log levels.

Settings object

Implemented by the Settings class.

Implements the ability to give Apibot settings.

Supports storing the settings info and retrieving specific settings - exactly specified, or multi-level merged, or specific by backend, etc.

Used by a lot of Apibot objects to retrieve their specific settings. Supplied to them on creating them.

Infostore object

Implemented by the Infostore class.

Implements the ability to write and read info in site-specific, user-specific or identity files.

Used by the Info and Identity objects to cache information. Supplied to them (through the Backend object) on creating them.

Global Info object

Implemented by the Info class.

A container object that "unites" the Info objects from the different backends. Has almost no methods of its own. However, when called with a method, checks the backend Info objects for having this method, in a configurable order. If the method is found, it is called with the parameters given, and its result is passed back.

Supplied to a lot of Apibot objects, directly or indirectly (as a property of the Core object that is supplied to them), on creating them.

Hooks object

Implemented by the Hooks class.

Maintains a list of the callbacks that occupy different hooks. The Apibot objects that have hooks use it to call the callback that has been hooked to the corresponding hook.

(The Hooks object remembers only the last callback that has been hooked. Since it is up to the hooking objects to preserve and use the callback that has been occupying the hook before they hooked to it, the last callback will remember and call the one before it, this one will call the one before it, and so on to the original default hook callback. Every hook has an original callback that does the default job.)

Supplied to all Apibot objects that export hooks, directly or indirectly (as a property of the Core object that is supplied to them), on creating them.

Backends array

A simple array of Backend objects.

Backend object structure

Implemented by backend-specific descendants of the generic Backend class.

Contains (as properties) objects from backend-specific classes.

Exchanger object

Implemented by backend-specific descendants of the generic Exchanger class.

Implements the ability to supply backend-specific parameters to the wiki and to form the data received by it in a backend-independent way. Uses the Apibot backend-specific Mainmodule class to implement the functionality maintained and requested by the MediaWiki API Mainmodule, or its equivalent in other backends.

Supplied to all Apibot Module objects (through the Backend object it belongs to) on creating them.

Identity object

Implemented by backend-specific descendants of the generic Identity class.

Implements the ability to store (in files) and supply backend-specific identity info (eg. cookies that identify a session as made from a specific account).

Supplied to the backend-specific Info objects on creating them. (Used by them when logging in to the wiki, which happens at checking the wiki info - and that happens before any other exchange starts.)

Tokens object

Implemented by backend-specific descendants of the generic Tokens class.

All MediaWiki actions that change the wiki info require specific tokens. Some of these tokens are valid throughout an entire session, others change with every action call, or with a specific call parameter value. The tokens are different for the same action on different interfaces.

This object implements the ability to retrieve, store and supply the tokens needed for the different wiki actions.

Supplied to the Apibot Module objects (through the Backend object it belongs to) on creating them.

Backend-specific Info object

Implemented by backend-specific descendants of the generic Backend_Info class - API_Info class and Web_Info class.

A MediaWiki can provide vast amount of info about itself and the account that the bot uses. This info can be very useful to a bot. For this reason, Apibot fetches it and uses it where possible. (For example, the parameters check in the Modules and the user permissions check in the Tasks depend on it, together with many other things.) Since this is a lot of info and makes a lot of traffic and time to exchange, Apibot also caches it. These functions, together with the ability to supply this info in a convenient way, are performed by the backend-specific Info objects.

The access to this information is usually done by a backend-independent Info object that serves as a container, enveloping and using the backend-specific Info objects.

Supplied to the backend-independent Info object on creating it (through the Core object and the Backend objects in it).

Task object structure

The Tasks objects perform specific actions upon the wiki. Their functionality in most cases matches the functionality of the corresponding MediaWIKI API action module. (The Apibot backends other than API emulate this behavior.)

They are created with the help of the Apibot core and use it for the backend work, much like the basic UNIX/Linux tools and libraries use the kernel. The Apibot interfaces use them as backends for doing the specific tasks, much like an application program will use the basic OS tools and libraries.

A MediaWiki API action module sometimes implements several functionalities that differ in the human eyes. For example, the API Edit action module allows both editing pages and undoing page changes. Other backends might use different modules for these functionalities. For this reason, there are different Task objects for the different functionalities, even if these are implemented by the same wiki action module.

The MediaWiki API Query action module, and the corresponding functionalities in other backends is able to fetch a huge diversity of info types. For ease of use, this functionality is implemented by an entire different category of objects - the Queries objects.

Module object

This is the lowest level in a Task structure.

Provides the exchange of info with a specific MediaWiki API action module, or the equivalent of its functionality in another backend.

On creation it receives and stores as a property an Exchanger object. Uses it to do the exchange with the wiki.

Handles the API action module specific parameters and verifies their validity on setting. During a call to the wiki, supplies to the Exchanger object the parameters set while calling the wiki. Parses the data obtained by the Exchanger and returns it as a wiki data element. (Depending on the data type, the element is represented as a scalar or an array.)

It is used as a backend by the upper level in a Task object, an Action object.

Action object

The second level in a Task structure.

On creation it creates and stores as a property a matching Module object. Uses it as a backend for doing calls to the wiki, and for setting and verifying the calls parameters.

Adds to the Module functionality:

  • logging of the process flow
  • support for default parameters

It is used as a backend by the third level in a Task object, a backend-specific Task object.

Backend-specific Task object

The backend-specific Task object is the third level in a Task structure. It is not API action module-specific, but functionality-specific. Several different backend-specific Task objects might use the same API action module (or another backend's equivalent) support.

On creation it creates and stores as a property a matching Action object. Uses it as a backend for doing calls to the wiki, setting and verifying the calls parameters, and setting default parameters.

Adds to the Action functionality:

  • translating functionalities to the appropriate wiki action module calls.
  • checking if this account has the permission to use this functionality in the wiki
  • support for bot settings for this specific task
  • extended logging support (with task-specific stuff)
  • resolving Apibot data structures to task-specific parameters (eg. you can supply a Page object or its corresponding array instead of a page name in string form).

It is used as a backend for the top level in a Task object, the backend-independent Task object.

Backend-independent Task object

This is the top Task level, and actually the Task objects themselves. They are what the Apibot interfaces see and work with. Like the backend-specific Task objects, they differ not by the MediaWiki API action modules they communicate with, but by the functionalities they implement, thus matching the backend-specific Task types.

A Task object is backend-independent. On creation it checks which backends can supply appropriate backend-specific Task modules, chooses one from the available by a configured or hardwired order of preference, creates a backend-specific Task module as its property and uses it to do the job.

On a need to perform its task, a Task object checks which backends implement a corresponding backend-specific Task object. If more than one backend does, the Task object chooses one by a configurable order of preference. Then the Task object creates as a property the chosen backend-specific Task object, and uses it as a backend for performing the task.

It is used as a backend by the Apibot interfaces.

Query object structure

Module object

This is the lowest level in a Query structure.

Provides the exchange of info with the MediaWiki API Query action module, or the equivalent of its functionality in another backend.

On creation it receives and stores as a property an Exchanger object. Uses it to do the exchange with the wiki.

It is used as a backend by the upper level in a Query object, the Query Action object.

Basic functionality (pure Module)

Handles the API Query action module specific parameters and verifies their validity on setting. During a call to the wiki, supplies to the Exchanger object the parameters set while calling the wiki. Parses the data obtained by the Exchanger and returns it as an array of wiki data elements. (Depending on the data type, the elements are represented as scalars or arrays.)

The MediaWiki API has a limit on the amount of info that can be obtained in a single request, and allows for a request for a portion of info from a certain point on. For this reason, the Apibot Query Module also implements the ability to "continue a request".

Module with Querymodules

This is an extension of the Query Module object - a child PHP class that adds the usage of Querymodules.

The MediaWiki API Query module is able to supply a big number of different wiki objects types. This is done by using different submodules, the so-called API Querymodules, specific for each wiki objects type. Apibot implements the functionality of obtaining this info in a matching way, as Apibot Querymodules that match the MediaWiki API Querymodules.

The Querymodules belong to three differing categories: Lists, Meta and Page properties:

  • Lists return lists of pages that match certain criteria, users, recentchanges, usercontribs, logevents, langlinks, user blocks etc.
  • Meta return info about the wiki, the account the bot is using etc.
  • Properties return info about different page properties that are naturally listed - links inside it, templates, files, page revisions etc. (Other page properties can also be obtained by requesting them in a page fetch call, but since they are not lists of entries belonging to the same type, Apibot does not support them in this way.)

In every category there is also a "by_name" querymodule. It is not an implementation of a counterpart of a specific MediaWiki API Querymodule, but a "generic". On creation it requires the name of the API Querymodule it will converse with, and fetches the information about how to communicate with it from the paraminfo fetched by Apibot from the wiki.

The Query Module with Querymodules objects extends the Query Module with support for submodules. These are the Querymodules and another module akin to them - the Pageset module, a counterpart for the MediaWiki API Pageset module. This support also adds methods for creating the submodules automatically when something indicates that they will be used (eg. a parameter of a Querymodule is set), setting their parameters, incorporating them in the calls to the wiki and parsing the data returned by these calls.

Action object

The second level in a Query structure.

On creation it creates and stores as a property a Query Module with Querymodules object. Uses it as a backend for doing calls to the wiki, and for setting and verifying the calls parameters.

Adds to the Query Module functionality:

  • logging of the process flow
  • support for default parameters

It is used as a backend by the third level in a Query object, a backend-specific Query object.

Backend-specific Query object

The backend-specific Query object is the third level in a Query structure.

There are many different backend-specific Query classes, each of them implementing a specific Query functionality. They match to some degree the Apibot Querymodules, with an important difference.

Almost all MediaWiki API Query Lists that return pages, and some that return data structures that have a page title in them, can work also in the so-called Generator mode. (This means supplying not just the page titles, but the complete page info, and basing additional lists or page properties listings on it.) Some page properties that return what is effectively a page title can also work in Generator mode. So, in addition to the three categories of Querymodules, there are also backend-specific Queries from a forth category - Generators.

On creation a backend-specific Query module creates and stores as a property an Action Query object. Uses it as a backend for doing calls to the wiki, setting and verifying the calls parameters, and setting default parameters.

Adds to the Action Query functionality:

  • translating functionalities to the appropriate wiki action module calls.
  • checking if this account has the permission to use this functionality in the wiki
  • support for bot settings for this specific task
  • extended logging support (with task-specific stuff)
  • resolving Apibot data structures to task-specific parameters (eg. you can supply a Page object or its corresponding array instead of a page name in string form).

It is used as a backend for the top level in a Query object, the backend-independent Query object.

Backend-independent Query object

This is the top Query level, and actually the Query objects themselves. They are what the Apibot interfaces see and work with. Like the backend-specific Query objects, they differ not by the MediaWiki API Querymodules they communicate with, but by the functionalities they implement, thus matching the backend-specific Query types.

A Query object is backend-independent. On creation it checks which backends can supply appropriate backend-specific Query modules, chooses one from the available by a configured or hardwired order of preference, creates a backend-specific Query module as its property and uses it to do the job.

On a need to obtain info, a Query object checks which backends implement a corresponding backend-specific Query object. If more than one backend does, the Query object chooses one by a configurable order of preference. Then the Query object creates as a property the chosen backend-specific Query object, and uses it as a backend for obtaining the info.

It is used as a backend by the Apibot interfaces.

Bridge interface structure

Implemented by the Bridge class.

The Bridge interface is similar to most other MediaWiki bot interfaces, including to the Apibot interface from versions 3.x. It is a big container class with a lot of functions, belonging to three categories:

  • Query functions, who return Query objects. You can use these to obtain the wiki information you need.
  • Non_Query functions, who perform tasks, based on the Tasks functionalities (and using Tasks as backends).
  • Service functions, who allow logging (the log() function) and making HTTP(S) transfers (the xfer() function).

A Bridge object will also have the properties $core and $info (the latter being the Info object in the Core object), whom you can access directly. The Core object is passed to the Bridge object on creating it. (There is also the Bridge_Standalone child class and object, whose only difference is that it receives the bot account and settings instead of a Core object, and creates the Core object internally.)

Assembly line interface structure

Implemented by the Assembly line classes.

Unlike most MediaWiki bot interfaces, and like the UNIX / Linux commands, this interface is made from a lot of different objects that perform specific functions. You string them together to do the job you need.

The classes of the Assembly line interface belong to two groups: Signals and Line objects.

Signals

The objects that make a line pass between themselves the data encapsulated in special objects called signals, much like data is passed in packets through a TCP connection. There are three types of signals, all of them children of the LineSignal class:

  • LineSignal_Start: signifies the start of a data stream
  • LineSignal_End: signifies the end of a data stream
  • LineSignal_Data: carries a data element - typically an Apibot data structure.

The data element is a part of a data block. Besides the data element itself, a data block contains also the data type (a MIME-like type that reflects the type of data carried), and a data key (a string or number that specifies the place of this data element in a succession; this succession is typically source-dependent, and is not a "packet number").

A Data signal may carry more than one data block. The data blocks are distinguished by keys. The keys are a feature of the line objects. These will have a default key (typically '*'), by which they will access the data block they need. The signal has methods that can copy, delete and add data blocks, change their keys etc.

In addition to the data blocks, a signal may carry additional metadata called parameters. A parameter consists of a keyword and a value, and belongs to a group. Parameters may reflect additional things about the data carried, or be a form of communication between the assembly line objects, etc.

A signal will also carry a log, where the line objects mark the type of activity they have performed over the signal and its data.

Line objects

All Line objects classes are children of the Line_Slot class.

A Line object can link to an object that is before it on the line - that is, to subscribe for receiving signals. Linking to more than one object is allowed. It can also link with an object that is after it on the line - that is, to pass them signals. Linking with more than object is allowed - that is, a "line" can actually have a tree, or even a network topology.

A Line object will log on every signal the job it has done on it and/or its data. Three elements of info are logged: object type, object name and a description of the job done. The object type is one of the object types, see below. The object name approximately reflects its role.

The first step of processing every signal is trying to set some Line object parameters from the signal. If there are signal parameters that belong to a group named like the Line object type (eg. "worker"), these parameters are set as parameters to the object, and this group is deleted from the signal parameters.

If the signal is a Start signal, the current parameters are saved before setting ones from the signal and restored after processing the corresponding End signal - that is, the parameters set from this signal are valid for the entire data stream. If the signal is a Data signal, the current parameters are saved before setting ones from the signal and restored after processing the signal data - that is, the parameters set from this signal are valid for processing its own data only. End signals do not save or set parameters, but only restore those stored by the corresponding Start signal.

More info about the Line objects classes can be found at the Assembly line classes page.

See also