Both LaTeX command and environments can be implemented in Python classes. plasTeX includes a base class for each one: Command for commands and Environment for environments. For the most part, these two classes behave in the same way. They both are responsible for parsing their arguments, organizing their child nodes, incrementing counters, etc. much like their LaTeX counterparts. The Python macro class feature set is based on common LaTeX conventions. So if the LaTeX macro you are implementing in Python uses standard LaTeX conventions, you job will be very easy. If you are doing unconventional operations, you will probably still succeed, you just might have to do a little more work.
The three most important parts of the Python macro API are: 1) the args attribute, 2) the invoke method, and 3) the digest method. When writing your own macros, these are used the most by far.
The args attribute is a string attribute on the class that indicates what the arguments to the macro are. In addition to simply indicating the number of arguments, whether they are mandatory or optional, and what characters surround the argument as in LaTeX, the args string also gives names to each of the argument and can also indicate the content of the argument (i.e. int, float, list, dictionary, string, etc.). The names given to each argument determine the key that the argument is stored under in the the attributes dictionary of the class instance. Below is a simple example of a macro class.
from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} """ args = '[ width ] [ pos ] text'
In the args string of the \framebox macro, three arguments are defined. The first two are optional and the third one is mandatory. Once each argument is parsed, in is put into the attributes dictionary under the name given in the args string. For example, the attributes dictionary of an instance of \framebox will have the keys “width”, “pos”, and “text” once it is parsed and can be accessed in the usual Python way.
self.attributes['width'] self.attributes['pos'] self.attributes['text']
In plasTeX, any argument that isn’t mandatory (i.e. no grouping characters in the args string) is optional1. This includes arguments surrounded by parentheses (( )), square brackets ([ ]), and angle brackets (< >). This also lets you combine multiple versions of a command into one macro. For example, the \framebox command also has a form that looks like: \framebox(x_dimen,y_dimen)[pos]{text}. This leads to the Python macro class in the following code sample that encompasses both forms.
from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} or \framebox(x_dimen,ydimen)[pos]{text} """ args = '( dimens ) [ width ] [ pos ] text'
The only thing to keep in mind is that in the second form, the pos attribute is going to end up under the width key in the attributes dictionary since it is the first argument in square brackets, but this can be fixed up in the invoke method if needed. Also, if an optional argument is not present on the macro, the value of that argument in the attributes dictionary is set to None.
As mentioned earlier, it is also possible to convert arguments to data types other than the default (a document fragment). A list of the available types is shown in the table below.
Name |
Purpose |
str |
expands all macros then sets the value of the argument in the attributes dictionary to the string content of the argument |
chr |
same as ‘str’ |
char |
same as ‘str’ |
cs |
sets the attribute to an unexpanded control sequence |
label |
expands all macros, converts the result to a string, then sets the current label to the object that is in the currentlabel attribute of the document context. Generally, an object is put into the currentlabel attribute if it incremented a counter when it was invoked. The value stored in the attributes dictionary is the string value of the argument. |
id |
same as ‘label’ |
idref |
expands all macros, converts the result to a string, retrieves the object that was labeled by that value, then adds the labeled object to the idref dictionary under the name of the argument. This type of argument is used in commands like \ref that must reference other abjects. The nice thing about ‘idref’ is that it gives you a reference to the object itself which you can then use to retrieve any type of information from it such as the reference value, title, etc. The value stored in the attributes dictionary is the string value of the argument. |
ref |
same as ‘idref’ |
nox |
just parses the argument, but doesn’t expand the macros |
list |
converts the argument to a Python list. By default, the list item separator is a comma (,). You can change the item separator in the args string by appending a set of parentheses surrounding the separator character immediately after ‘list’. For example, to specify a semi-colon separated list for an argument called “foo” you would use the args string: “foo:list(;)”. It is also possible to cast the type of each item by appending another colon and the data type from this table that you want each item to be. However, you are limited to one data type for every item in the list. |
dict |
converts the argument to a Python dictionary. This is commonly used by arguments set up using LaTeX’s ‘keyval’ package. By default, key/value pairs are separated by commas, although this character can be changed in the same way as the delimiter in the ‘list’ type. You can also cast each value of the dictionary using the same method as the ‘list’ type. In all cases, keys are converted to strings. |
dimen |
reads a dimension and returns an instance of dimen |
dimension |
same as ‘dimen’ |
length |
same as ‘dimen’ |
number |
reads an integer and returns a Python integer |
count |
same as ‘number’ |
int |
same as ‘number’ |
float |
reads a decimal value and returns a Python float |
double |
same as ‘float’ |
There are also several argument types used for more low-level routines. These don’t parse the typical LaTeX arguments, they are used for the somewhat more free-form TeX arguments.
Name |
Purpose |
Dimen |
reads a TeX dimension and returns an instance of dimen |
Length |
same as ‘Dimen’ |
Dimension |
same as ‘Dimen’ |
MuDimen |
reads a TeX mu-dimension and returns an instance of mudimen |
MuLength |
same as ‘MuDimen’ |
Glue |
reads a TeX glue parameter and returns an instance of glue |
Skip |
same as ‘MuLength’ |
Number |
reads a TeX integer parameter and returns a Python integer |
Int |
same as ‘Number’ |
Integer |
same as ‘Number’ |
Token |
reads an unexpanded token |
Tok |
same as ‘Token’ |
XToken |
reads an expanded token |
XTok |
same as ‘XToken’ |
Args |
reads tokens up to the first begin group (i.e. {) |
To use one of the data types, simple append a colon (:) and the data type name to the attribute name in the args string. Going back to the \framebox example, the argument in parentheses would be better represented as a list of dimensions. The width parameter is also a dimension, and the pos parameter is a string.
from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} or \framebox(x_dimen,ydimen)[pos]{text} """ args = '( dimens:list:dimen ) [ width:dimen ] [ pos:chr ] text'
The invoke method is responsible for creating a new document context, parsing the macro arguments, and incrementing counters. In most cases, the default implementation will work just fine, but you may want to do some extra processing of the macro arguments or counters before letting the parsing of the document proceed. There are actually several methods in the API that are called within the scope of the invoke method: preParse, preArgument, postArgument, and postParse.
The order of execution is quite simple. Before any arguments have been parsed, the preParse method is called. The preArgument and postArgument methods are called before and after each argument, respectively. Then, after all arguments have been parsed, the postParse method is called. The default implementations of these methods handle the stepping of counters and setting the current labeled item in the document. By default, macros that have been “starred” (i.e. have a ‘*’ before the arguments) do not increment the counter. You can override this behavior in one of these methods if you prefer.
The most common reason for overriding the invoke method is to post-process the arguments in the attributes dictionary, or add information to the instance. For example, the \color command in LaTeX’s color package could convert the LaTeX color to the correct CSS format and add it to the CSS style object.
from plasTeX import Command, Environment def latex2htmlcolor(arg): if ',' in arg: red, green, blue = [float(x) for x in arg.split(',')] red = min(int(red * 255), 255) green = min(int(green * 255), 255) blue = min(int(blue * 255), 255) else: try: red = green = blue = float(arg) except ValueError: return arg.strip() return '#%.2X%.2X%.2X' % (red, green, blue) class color(Environment): args = 'color:str' def invoke(self, tex): a = Environment.invoke(tex) self.style['color'] = latex2htmlcolor(a['color'])
While simple things like attribute post-processing is the most common use of the invoke method, you can do very advanced things like changing category codes, and iterating over the tokens in the TeX processor directly like the verbatim environment does.
One other feature of the invoke method that may be of interest is the return value. Most invoke method implementations do not return anything (or return None). In this case, the macro instance itself is sent to the output stream. However, you can also return a list of tokens. If a list of tokens is returned, instead of the macro instance, those tokens are inserted into the output stream. This is useful if you don’t want the macro instance to be part of the output stream or document. In this case, you can simply return an empty list.
The digest method is responsible for converting the output stream into the final document structure. For commands, this generally doesn’t mean anything since they just consist of arguments which have already been parsed. Environments, on the other hand, have a beginning and an ending which surround tokens that belong to that environment. In most cases, the tokens between the \begin and \end need to be absorbed into the childNodes list.
The default implementation of the digest method should work for most macros, but there are instances where you may want to do some extra processing on the document structure. For example, the \caption command within figures and tables uses the digest method to populate the enclosing figure/table’s caption attribute.
from plasTeX import Command, Environment class Caption(Command): args = '[ toc ] self' def digest(self, tokens): res = Command.digest(self, tokens) # Look for the figure environment that we belong to node = self.parentNode while node is not None and not isinstance(node, figure): node = node.parentNode # If the figure was found, populate the caption attribute if isinstance(node, figure): node.caption = self return res class figure(Environment): args = '[ loc:str ]' caption = None class caption_(Caption): macroName = 'caption' counter = 'figure'
More advanced uses of the digest method might be to construct more complex document structures. For example, tabular and array structures in a document get converted from a simple list of tokens to complex structures with lots of style information added (see section 3.3.3). One simple example of a digest that does something extra is shown below. It looks for the first node with the name “item” then bails out.
from plasTeX import Command, Environment class toitem(Command): def digest(self, tokens): """ Throw away everything up to the first 'item' token """ for tok in tokens: if tok.nodeName == 'item': # Put the item back into the stream tokens.push(tok) break
One of the more advanced uses of the digest is on the sectioning commands: \section, \subsection, etc. The digest method on sections absorb tokens based on the level attribute which indicates the hierarchical level of the node. When digested, each section absorbs all tokens until it reaches a section that has a level that is equal to or higher than its own level. This creates the overall document structure as discussed in section 3.
There are many other attributes and methods on macros that can be used to affect their behavior. For a full listing, see the API documentation in section 6.1. Below are descriptions of some of the more commonly used attributes and methods.
The level attribute is an integer that indicates the hierarchical level of the node in the output document structure. The values of this attribute are taken from LaTeX: \part is -1, \chapter is 0, \section is 1, \subsection is 2, etc. To create your owne sectioning commands, you can either subclass one of the existing sectioning macros, or simply set its level attribute to the appropriate number.
The macroName attribute is used when you are creating a LaTeX macro whose name is not a legal Python class name. For example, the macro \@ifundefined has a ‘@’ in the name which isn’t legal in a Python class name. In this case, you could define the macro as shown below.
class ifundefined_(Command): macroName = '@ifundefined'
The counter attribute associates a counter with the macro class. It is simply a string that contains the name of the counter. Each time that an instance of the macro class is invoked, the counter is incremented (unless the macro has a ‘*’ argument).
The ref attribute contains the value normally returned by the \ref command.
The title attribute retrieves the “title” attribute from the attributes dictionary. This attribute is also overridable.
The same as the title attribute, but also includes the counter value at the beginning.
The tocEntry attribute retrieves the “toc” attribute from the attributes dictionary. This attribute is also overridable.
The same as the tocEntry attribute, but also includes the counter value at the beginning.
The style attribute is a CSS style object. Essentially, this is just a dictionary where the key is the CSS property name and the value is the CSS property value. It has an attribute called inline which contains an inline version of the CSS properties for use in the style= attribute of HTML elements.
This attribute contains a unique ID for the object. If the object was labeled by a \label command, the ID for the object will be that label; otherwise, an ID is generated.
The source attribute contains the LaTeX source representation of the node and all of its contents.
The currentSection attribute contains the section that the node belongs to.
The expand method is a thin wrapper around the invoke method. It simply invokes the macro and returns the result of expanding all of the tokens. Unlike invoke, you will always get the expanded node (or nodes); you will not get a None return value.
The paragraphs method does the final processing of paragraphs in a node’s child nodes. It makes sure that all content is wrapped within paragraph nodes. This method is generally called from the digest method.
Footnotes