Assembly line fetchers

From Apibot
Jump to: navigation, search

Many feeders do not supply all characteristics of a data object. For example, the Allpages list feeder supplies only a small part of the full page info, not including even the page text. For some tasks this is not enough.

Here comes a group of line objects called fetchers. They retrieve more info about a specific object. For example, the Fetcher_Page_Title object will retrieve the page properties, typically including the page text, by the page title. Thus you will be able to make changes in this text, probably with a Worker_EditPage object, and then to submit it to the wiki.

All fetchers files are situated in subdirectories of the directory interfaces/line/fetchers.

All of them have the public property $backup_data_key. If set to a name, the default data block of the processed signals will not be overwritten by the fetched data, but will be preserved under the name set as string to this property.

Wiki fetchers

Situated in the wiki subdirectory in the fetchers directory.

Fetcher_Wiki_Block

Implemented in the file block.php. Fetches an user / IP / IP diapazone block (identified by block ID).

Has the following public properties:

Fetcher_Wiki_Filename

Implemented in the file filename.php. Fetches a File dataobject (identified by file dataobject or array, filename or file page title).

Supports all public properties described in the MediaWiki docs for the imageinfo page property, that the wiki you fetch from supports. (Same for the possible values for the $prop property.)

In addition, supports the following public properties:

  • $fetch_body - if set to true, will fetch also the file body. Default: false.
  • $body_file - if set, will not set the file body (if fetched) as a File dataobject property, but will write it into a file with this pathname, and will set the File dataobject property $body_link to the file pathname. (Since this file will be expected to be temporary, given that new ones will keep coming, the File dataobject property $body_link_is_tempfile will also be set to true.)

Fetcher_Wiki_Filename_Body

Implemented in the file filename_body.php. Similar to the previous, but fetches only the file body. Has only one public property:

  • $body_file - like the same property in the previous fetcher.

Fetcher_Wiki_Page

Implemented in the file page.php. Fetches a Page dataobject by a title, pageid or revid (of a revision belonging to this page), in that order, supplied by the data element currently in the signal.

Has the following public properties:

  • $create_missing_pages - if false (default), will fail on attempting to fetch a non-existing page; if true, will create an empty page dataobject for it.

Fetcher_Wiki_Pageid

Implemented in the file pageid.php. Fetches a Page dataobject by a pageid, supplied by the data element currently in the signal.

Has the same public properties as Fetcher_Wiki_Page.

Fetcher_Wiki_Revid

Implemented in the file revid.php. Fetches a Page dataobject by a revid (of a revision belonging to this page), supplied by the data element currently in the signal.

Has the same public properties as Fetcher_Wiki_Page.

Fetcher_Wiki_Title

Implemented in the file title.php. Fetches a Page dataobject by a title, supplied by the data element currently in the signal.

Has the same public properties as Fetcher_Wiki_Page.

Fetcher_Wiki_User

Implemented in the file user.php. Fetches a Page dataobject by an username, supplied by the data element currently in the signal.

Has the following public properties:

Batch fetchers

These fetch not a single wiki pageset elements (page, file etc), but entire batches. Thus their usage improves the speed of the bot and decreases the traffic between it and the wiki (more if compression is used - for certain tasks the speed improvement might be in tens of times).

A batch fetcher will typically store the data elements that arrive to it, until a certain count is reached. When this happens or the feed ends, the entire batch of pages is fetched at once. Then, the fetched elements are sent one by one, as if they have never been grouped.

All batch fetchers has the following public properties:

  • $properties - the page properties to fetch, as per MediaWiki API requirements
  • $batch_size - the count that must be reached. If left unset, the max count for a pageset request will be used.

Fetcher_Wiki_Batch_Titles

Implemented in the file batch_titles.php. Fetches pages by titles.

Fetcher_Wiki_Batch_Files

Implemented in the file batch_files.php. Fetches pages with image / file properties, and possibly file body, by titles or file names.

In addition to the public properties for all batch fetchers, has also the same properties as Fetcher_Wiki_Filename.

Misc fetchers

Situated in the misc subdirectory in the fetchers directory.

Fetcher_HTTP

Implemented in the file http.php. Fetches any data string returned by a HTTP request. Fails if the info is not fetched.

Has the following public properties:

  • $uri - the URI to fetch the data from.
  • $vars - an array with HTTP parameters (paramnames as keys, paramvalues as values).
  • $files - an array with HTTP files parameters (paramnames as keys, paramvalues as values).
  • $mustbeposted - true - use POST method, false - use GET method if possible.
  • $content_type - if set, use this string as a content type for the data fetched; if not set, check the "Content-Type" HTTP header of the data received.
  • $stop_on_fail - true - stop processing this signal if data is not fetched, false - don't stop

Autofetchers

You will also find in interfaces/line/fetchers/_auto a group of tools called autofetchers. These are NOT line objects, may not be strung in an assembly line and for this reason are not described here.

Instead, they are used internally by some line objects (mostly workers) to automatically fetch needed data that is not present in the data element passed. Typically they work by creating internally on need an encapsulated (not added to the line) fetcher object, and using it to fetch the data required. You might be interested in them if writing your own line objects (mostly workers).