plasTeX — A Python Framework for Processing LaTeX Documents: Macro Objects

6.1.1 Macro Objects




class Macro(): The Macro class is the base class for all Python based macros although you will generally want to subclass from Command or Environment in real-world use. There are various attributes and methods that affect how Python macros are parsed, constructed and inserted into the resulting DOM. These are described below.

args

specifies the arguments to the L^aT_eX macro and their data types. The args attribute gives you a very simple, yet extremely powerful way of parsing L^aT_eX macro arguments and converting them into Python objects. Once parsed, each L^aT_eX macro argument is set in the attributes dictionary of the Python instance using the name given in the args string. For example, the following args string will direct plasT_eX to parse two mandatory arguments, ‘id’ and ‘title’, and put them into the attributes dictonary.

args = 'id title'

You can also parse optional arguments, usually surrounded by square brackets ([ ]). However, in plasT_eX, any arguments specified in the args string that aren’t mandatory (i.e. no braces surrounding it) are automatically considered optional. This may not truly be the case, but it doesn’t make much difference. If they truly are mandatory, then your L^aT_eX source file will always have them and plasT_eX will simply always find them even though it considers them to be optional.

Optional arguments in the args string are surround by matching square brackets ([ ]), angle brackets (< >), or parentheses (( )). The name for the attribute is placed between the matching symbols as follows:

args = '[ toc ] title'
args = '( position ) object'
args = '< markup > ref'

You can have as many optional arguments as you wish. It is also possible to have optional arguments using braces ({ }), but this requires you to change T_eX’s category codes and is not common.

Modifiers such as asterisks (*) are also allowed in the args string. You can also use the plus (+) and minus (-) signs as modifiers although these are not common. Using modifiers can affect the incrementing of counters (see the parse() method for more information).

In addition to specifying which arguments to parse, you can also specify what the data type should be. By default, all arguments are processed and stored as document fragments. However, some arguments may be simpler than that. They may contain an integer, a string, an ID, etc. Others may be collections like a list or dictionary. There are even more esoteric types for mostly internal use that allow you to get unexpanded tokens, T_eX dimensions, and the like. Regardless, all of these directives are specified in the same way, using the typecast operator: ‘:’. To cast an argument, simply place a colon (:) and the name of the argument type immediately after the name of the argument. The following example casts the ‘filename’ argument to a string.

args = 'filename:str'

Parsing compound arguments such as lists and dictionaries is very similar.

args = 'filenames:list'

By default, compound arguments are assumed to be comma separated. If you are using a different separator, it is specified in parentheses after the type.

args = 'filenames:list(;)'

Again, each element element in the list, by default, is a document fragment. However, you can also give the data type of the elements with another typecast.

args = 'filenames:list(;):str'

Parsing dictionaries is a bit more restrictive. plasT_eX assumes that dictionary arguments are always key-value pairs, that the key is always a string and the separator between the key and value is an equals sign (=). Other than that, they operate in the same manner.

A full list of the supported data types as well as more examples are discussed in section 4.



argSource: the source for the L^aT_eX arguments to this macro. This is a read-only attribute.



arguments: gives the arguments in the args attribute in object form (i.e. Argument objects). Note: This is a read-only attribute. Note: This is generally an internal-use-only attribute.



blockType: indicates whether the macro node should be considered a block-level element. If true, this node will be put into its own paragraph node (which also has the blockType set to True) to make it easier to generate output that requires block-level to exist outside of paragraphs.



counter: specifies the name of the counter to associate with this macro. Each time an instance of this macro is created, this counter is incremented. The incrementing of this counter, of course, resets any “child” counters just like in L^aT_eX. By default and L^aT_eX convention, if the macro’s first argument is an asterisk (i.e. *), the counter is not incremented.



id: specifies a unique ID for the object. If the object has an associated label (i.e. \label), that is its ID. You can also set the ID manually. Otherwise, an ID will be generated based on the result of Python’s id() function.



idref: a dictionary containing all of the objects referenced by “idref” type arguments. Each idref attribute is stored under the name of the argument in the idref dictionary.



level: specifies the hierarchical level of the node in the DOM. For most macros, this will be set to Node.COMMAND_LEVEL or Node.ENVIRONMENT_LEVEL by the Command and Environment macros, respectively. However, there are other levels that invoke special processing. In particular, sectioning commands such as \section and \subsection have levels set to Node.SECTION_LEVEL and Node.SUBSECTION_LEVEL. These levels assist in the building of an appropriate DOM. Unless you are creating a sectioning command or a command that should act like a paragraph, you should leave the value of this attribute alone. See section 6.3 for more information.

macroName

specifies the name of the L^aT_eX macro that this class corresponds to. By default, the Python class name is the name that is used, but there are some legal L^aT_eX macro names that are not legal Python class names. In those cases, you would use macroName to specify the correct name. Below is an example.

class _illegalname(Command):
    macroName = '@illegalname'

Note: This is a class attribute, not an instance attribute.

macroMode

specifies what the current parsing mode is for this macro. Macro classes are instantiated for every invocation including each \begin and \end. This attribute is set to Macro.MODE_NONE for normal commands, Macro.MODE_BEGIN for the beginning of an environment, and Macro.MODE_END for the end of an environment.

These attributes are used in the invoke() method to determine the scope of macros used within the environment. They are also used in printing the source of the macro in the source attribute. Unless you really know what you are doing, this should be treated as a read-only attribute.



mathMode: boolean that indicates that the macro is in T_eX’s “math mode.” This is a read-only attribute.



nodeName: the name of the node in the DOM. This will either be the name given in macroName, if defined, or the name of the class itself. Note: This is a read-only attribute.



ref: specifies the value to return when this macro is referenced (i.e. \ref). This is set automatically when the counter associated with the macro is incremented.



source: specifies the L^aT_eX source that was parsed to create the object. This is most useful in the renderer if you need to generate an image of a document node. You can simply retrieve the L^aT_eX source from this attribute, create a L^aT_eX document including the source, then convert the DVI file to the appropriate image type.

style

specifies style overrides, in CSS format, that should be applied to the output. This object is a dictionary, so style property names are given as the key and property values are given as the values.

inst.style['color'] = 'red'
inst.style['background-color'] = 'blue'

Note: Not all renderers are going to support CSS styles.



tagName: same as nodeName



title: specifies the title of the current object. If the attributes dictionary contains a title, that object is returned. An AttributeError is thrown if there is no ‘title’ key in that dictionary. A title can also be set manually by setting this attribute.






digest(tokens): absorb the tokens from the given output stream that belong to the current object. In most commands, this does nothing. However, L^aT_eX environments have a \begin and an \end that surround content that belong to them. In this case, these environments need to absorb those tokens and construct them into the appropriate document object model (see the Environment class for more information).






digestUntil(tokens, endclass): utility method to help macros like lists and tables digest their contents. In lists and tables, the items, rows, and cells are delimited by \begin and \end tokens. They are simply delimited by the occurrence of another item, row, or cell. This method allows you to absorb tokens until a particular class is reached.






expand(): the expand method is a thin wrapper around the invoke method. The expand method makes sure that all tokens are expanded and will not return a None value like invoke.

invoke()

invakes the macro. Invoking the macro, in the general case, includes creating a new context, parsing the options of the macro, and removing the context. L^aT_eX environments are slightly different. If macroMode is set to Macro.MODE_BEGIN, the new context is kept on the stack. If macroMode is set to Macro.MODE_END, no arguments are parsed, the context is simply popped. For most macros, the default implementation will work fine.

The return value for this method is generally None (an empty return statement or simply no return statement). In this case, the current object is simply put into the resultant output stream. However, you can also return a list of tokens. In this case, the returned tokens will be put into the output stream in place of the current object. You can even return an empty list to indicate that you don’t want anything to be inserted into the output stream.






locals(): retrieves all of the L^aT_eX macros that belong to the scope of the current Python based macro.

paragraphs(force=True)

group content into paragraphs. Paragraphs are grouped once all other content has been digested. The paragraph grouping routine works like T_eX’s, in that environments are included inside paragraphs. This is unlike HTML’s model, where lists and tables are not included inside paragraphs. The force argument allows you to decide whether or not paragraphs should be forced. By default, all content of the node is grouped into paragraphs whether or not the content originally contained a paragraph node. However, with force set to False, a node will only be grouped into paragraphs if the original content contained at least one paragraph node.

Even though the paragraph method follow’s T_eX’s model, it is still possible to generate valid HTML content. Any node with the blockType attribute set to True is considered to be a block-level node. This means that it will be contained in its own paragraph node. This paragraph node will also have the blockType attribute set to True so that in the renderer the paragraph can be inserted or ignored based on this attribute.

parse(tex)

parses the arguments defined in the args attribute from the given token stream. This method also calls several hooks as described in the table below.

Method Name	Description
`preParse()`	called at the beginning of the argument parsing process
`preArgument()`	called before parsing each argument
`postArgument()`	called after parsing each argument
`postParse()`	called at the end of the argument parsing process

The methods are called to assist in labeling and counting. For example, by default, the counter associated with a macro is automatically incremented when the macro is parsed. However, if the first argument is a modifier (i.e. *, +, -), the counter will not be incremented. This is handled in the preArgument() and postArgument() methods.

Each time an argument is parsed, the result is put into the attributes dictionary. The key in the dictionary is, of course, the name given to that argument in the args string. Modifiers such as *, +, and - are stored under the special key ‘*modifier*’.

The return value for this method is simply a reference to the attributes dictionary.

Note: If parse() is called on an instance with macroMode set to Macro.MODE_END, no parsing takes place.

postArgument(arg, tex)

called after parsing each argument. This is generally where label and counter mechanisms are handled.

arg is the Argument instance that holds all argument meta-data including the argument’s name, source, and options.

tex is the TeX instance containing the current context






postParse(tex): do any operations required immediately after parsing the arguments. This generally includes setting up the value that will be returned when referencing the object.

preArgument(arg, tex)

called before parsing each argument. This is generally where label and counter mechanisms are handled.

arg is the Argument instance that holds all argument meta-data including the argument’s name, source, and options.

tex is the TeX instance containing the current context






preParse(tex): do any operations required immediately before parsing the arguments.






refstepcounter(tex): set the object as the current labellable object and increment its counter. When an object is set as the current labellable object, the next \label command will point to that object.






stepcounter(tex): step the counter associated with the macro