Skip to main content

Getting Started with Web Ingestion in Docket

Sleeba Paul avatar
Written by Sleeba Paul
Updated over 3 weeks ago

Web Ingestion in Docket makes it easy to bring your website content directly into Docket so your team and the agents can work with the most up-to-date information.

With Web Ingestion, you can ingest entire domains, specific URLs, or site maps — and keep them automatically in sync.

Ways to Add Content

1. Ingest an Entire Domain

If you want to bring in all the content from a website, you can add a full domain (e.g., google.com).


Docket will crawl the site, discover URLs, and ingest the pages into your workspace.

⚠️ Note: Crawling an entire domain may also bring in unwanted URLs. To control what gets ingested, we recommend using site maps.

2. Use Site Maps (Recommended)

Most websites have one or more site maps that list relevant URLs. When you add a domain, Docket will automatically detect available site maps.


You can:

  • Select all site maps, or

  • Choose specific ones to ingest.

This ensures only the right pages are ingested, improving answer quality and avoiding unnecessary content.

3. Add Individual URLs

Need a quick start without waiting for a full crawl?


You can paste one or more URLs directly and begin using them right away. This is especially useful for onboarding or testing with just a few pages.

⚠️ Note: Docket supports ingestion of standard web pages via individual URLs. For ingesting links from Google Drive, OneDrive, or SharePoint, you must be connected via our dedicated integrations instead.

Automatic and Manual Re-Syncing

  • Automatic Re-Sync: By default, ingested content will re-sync every 30 days. You can adjust this to as frequently as once per week.

  • Manual Re-Sync: If content on your site changes and you don’t want to wait, you can trigger a re-sync at any time from the record’s menu.

Visibility into Ingestion

You’ll have full visibility into what’s happening during ingestion:

  • Discovered URLs: Total number of URLs found.

  • Completed: Pages successfully ingested.

  • Needs Attention: Pages not ingested, with reasons (e.g., duplicate, 404 error).

  • Error Categories: See exactly why a URL failed.

You can expand records to see detailed information, including:

  • When the job started

  • Next scheduled sync

  • Crawl duration

  • Lists of URLs with their statuses

This transparency makes it easy to monitor progress and troubleshoot issues.

Example Use Cases

  • Crawl and ingest your entire knowledge base domain.

  • Select a single site map to bring in just your documentation pages.

  • Add a few URLs for a fast onboarding experience.

With these features, you can ensure only the most relevant and accurate content is powering your agents in Docket.

Did this answer your question?