API Reference
DO.New
The DO.New
method is used to create a new SuperDoer instance for task execution. It provides an interface for configuring and instantiating a task executor with a given model, name, and purpose. There are two versions available: a synchronous version (New
) and an asynchronous version (A_new
).
New
Signature:
New(model: Model, **kwargs): SuperDoer
# Async version
A_new(model: Model, **kwargs): SuperDoer
Creates a new SuperDoer instance for task execution. This synchronous method wraps the asynchronous A_new
method for ease of use in synchronous contexts.
Arguments:
model
(Model): The model instance to use for task execution.name
(string): A required name for the doer.purpose
(string): A required description of the doer’s purpose.- Additional optional configuration can be provided via
**kwargs
.
Usage Example:
doer = DO.New(model, name="task_executor", purpose="execute various tasks")
# Async version
doer = await DO.A_new(model, name="task_executor", purpose="execute various tasks")
SuperDoer
The SuperDoer
is an advanced task executor. It offers powerful methods to manage constraints, provisions, and task execution, enabling complex workflows for task automation and decision-making.
realm
Signature:
realm(provisions: List[Provision]) -> SuperDoer
The realm
method adds additional resources or capabilities (provisions) to the SuperDoer
instance. It returns a new instance with the specified provisions, such as browser, mcp.run tasks etc., which can be used to extend functionality during task execution.
Usage Example:
doer_with_provisions = doer.realm([browser_instance])
envision
Signature:
envision(constraints: dict or PydanticModel, verify?: Callable[[Any], Any]) -> SuperDoer
The envision
method allows you to set output constraints that the final answer must meet. Constraints can be specified either as a simple Python dictionary or as a Pydantic model for schema-based validation. In addition, an optional verify
function can be provided to perform further custom checks on the final result. If the output fails to meet the constraints or the verification function’s criteria, a ValidationError
will be raised.
This mechanism ensures that the results produced by the enact
method adhere strictly to the specified format or criteria, providing robust validation for task execution.
Usage Example:
class TeamMember(BaseModel):
"""A team member"""
name: str = Field(description="The name of the person")
bio: Optional[str] = Field(
..., description="a short bio of the person if known"
)
is_founder: bool = Field(
description="Whether the person is a founder of the company"
)
class Team(BaseModel):
"""The team"""
members: list[TeamMember] = Field(description="The team members")
doer_with_constraints = doer.envision(
constraints=Team,
verify=lambda team: True if len(team.members) > 0 else "Empty team")
)
enact
Signature:
enact(task: str, params?: dict[str, any]) -> Any
The enact
method is an asynchronous operation that executes the specified task, integrating the configured model, output constraints from envision
, and added provisions from realm
. It validates and processes the result according to the defined constraints and returns the final output of the task.
Usage Example:
result = await doer.enact("execute task description")
The SuperDoer
supports method chaining, allowing you to successively call realm
and envision
before executing a task with enact
. This design enables flexible and modular workflows for complex automation scenarios.
DO.Browse
The WebBrowser
interface manages browser sessions and page history, enabling autonomous navigation and interaction with web pages. Use these methods to perform a range of automated browser tasks including navigation, form interaction, cookie and storage management, image and text extraction, and knowledge graph extraction.
Initialization
Signature:
DO.Browse(**kwargs) -> WebBrowser
# Async version
DO.A_browse(**kwargs) -> WebBrowser
Creates a new browser instance for web automation. The synchronous Browse
method wraps the asynchronous A_browse
method for ease of use in synchronous contexts.
Arguments:
headless
(bool, optional): Controls browser visibility. Defaults to True.chrome_path
(str, optional): Path to Chrome executable.user_data_dir
(str, optional): Chrome user profile directory.channel
(str, optional): Browser channel to use. Defaults to “chromium”.screen
(dict, optional): Screen dimensions. Example:{"width": 1920, "height": 1080}
bypass_csp
(bool, optional): Bypass Content Security Policy. Defaults to False.- Additional configuration can be provided via
**kwargs
.
Usage Example:
# Basic initialization
browser = DO.Browse()
# Initialization with custom configuration
browser = DO.Browse(
headless=False,
channel="chrome",
screen={"width": 1440, "height": 900}
)
# Async initialization
browser = await DO.A_browse(headless=False)
# Using with Chrome profile
browser = DO.Browse(
chrome_path="/path/to/chrome",
user_data_dir="/path/to/profile",
channel="chrome",
headless=False,
args=["--profile-directory=Profile 2"]
)
goto
Signature:
goto(url: string): Promise<void>
Navigates to the specified URL in a new page.
Arguments:
url
(string): The URL to navigate to.
Usage Example:
await browser.goto("https://example.com");
annotation
Signature:
annotation(enabled?: boolean): Promise<void>
Toggles visual annotation of elements on the page. When enabled, elements may be highlighted to assist with debugging and analysis.
Arguments:
enabled
(boolean, default: true): Determines whether to enable or disable annotation.
Usage Example:
await browser.annotation(); // Enable annotations by default
await browser.annotation(false); // Disable annotations
cookies
Signature:
cookies(cookies?: Record<string, string>): Promise<Record<string, string>>
Gets or sets cookies for the current browser context using Playwright’s cookie mechanisms.
Arguments:
cookies
(optional, Record<string, string>): If provided, sets the cookies and returns the updated state. If omitted, returns the current cookies.
Usage Example:
// Retrieve current cookies
const currentCookies = await browser.cookies();
// Set new cookies
const updatedCookies = await browser.cookies({
session: "abc123",
user_id: "12345"
});
storage
Signature:
storage(storageState?: {
localStorage: Record<string, string>;
sessionStorage: Record<string, string>;
}): Promise<{
localStorage: Record<string, string>;
sessionStorage: Record<string, string>;
}>
Gets or sets the storage state (both localStorage and sessionStorage) for the current page.
Arguments:
storageState
(optional): An object with the following structure:If omitted, the current storage state is returned.{ localStorage: { key: "value", ... }, sessionStorage: { key: "value", ... } }
Usage Example:
// Getting current storage state
const state = await browser.storage();
// Setting new storage state
await browser.storage({
localStorage: { user: "jane" },
sessionStorage: { token: "xyz789" }
});
click
Signature:
click(elementId: number): Promise<void>
Clicks on an element by its identifier. The click action moves the pointer heuristically and performs a mouse click.
Arguments:
elementId
(number): The ID of the element to click.
Usage Example:
await browser.click(42);
type
Signature:
type(elementId: number, text: string): Promise<void>
Types the provided text into the specified element.
Arguments:
elementId
(number): The ID of an input element.text
(string): The text string to type.
Usage Example:
await browser.type(33, "Hello, world!");
image
Signature:
image(elementId?: number, bbox?: [number, number, number, number], viewport?: boolean): Promise<Buffer>
Captures a screenshot in PNG format. If elementId
is provided, captures that element; otherwise, captures the entire page.
Arguments:
elementId
(optional, number): The ID of the element to capture. If omitted, the entire page is captured.bbox
(optional, tuple): A tuple[x1, y1, x2, y2]
specifying a crop area.viewport
(optional, boolean): If true andelementId
is omitted, captures only the viewport.
Returns: PNG image content as a Buffer.
Usage Example:
// Capture the entire page
const pageImage = await browser.image();
// Capture a specific element
const elementImage = await browser.image(15);
text
Signature:
text(elementId?: number): Promise<string>
Retrieves text content with interactive elements marked in the format [id@type#subtype]
.
Arguments:
elementId
(optional, number): The ID of the element. If omitted, returns the full page text.
Returns: A string representing the text content.
Usage Example:
const pageText = await browser.text();
elements
Signature:
elements(bbox?: [number, number, number, number]): Promise<Record<number, ElementMetadata>>
Retrieves metadata for all elements on the page, optionally filtered by a bounding box.
Arguments:
bbox
(optional, tuple): A crop filter specified as[x1, y1, x2, y2]
.
Returns: An object mapping element IDs to their metadata.
Usage Example:
const elements = await browser.elements();
evaluate
Signature:
evaluate(script: string): Promise<any>
Evaluates the provided JavaScript expression in the context of the current page.
Arguments:
script
(string): The JavaScript code to run.
Returns: The result of the evaluated expression.
Usage Example:
const title = await browser.evaluate("document.title");
close
Signature:
close(): Promise<void>
Closes the current page, ending the browser session for that page.
Usage Example:
await browser.close();
state
Signature:
state(): Promise<string>
Retrieves the current state of the page, including interaction history, an overview of page elements, and top entities extracted from the page.
Returns: A string representing the current state of the page.
Usage Example:
const currentState = await browser.state();
console.log(currentState);
analyze
Signature:
analyze(): Promise<string>
Analyzes the current page performing Knowledge Graph extraction and entity recognition. This enriched analysis can be later used by the state()
method for deeper insights.
Usage Example:
const analysis = await browser.analyze();
console.log(analysis);