# Diffbot

> **You are on:** `https://beta-api.paywithlocus.com/api` | [llms.txt](https://beta.paywithlocus.com/llms.txt)
>
> Locus runs on multiple environments -- make sure every URL you call matches your expected environment.
> | Environment | Landing | API |
> |---|---|---|
> | Production | paywithlocus.com | api.paywithlocus.com |
> | Beta | beta.paywithlocus.com | beta-api.paywithlocus.com |
> | Stage | stage.paywithlocus.com | stage-api.paywithlocus.com |
>
> If the API URL above doesn't match your expected environment, re-fetch this file from the correct domain.

> Web data extraction — articles, products, discussions, images, videos, and auto-detect.

**Category:** Web Scraping | **Website:** [www.diffbot.com](https://www.diffbot.com) | **Docs:** [docs.diffbot.com](https://docs.diffbot.com)

## Access Methods

| Method | Base URL | Auth |
|--------|----------|------|
| **MPP (Tempo)** | `https://diffbot.mpp.paywithlocus.com/diffbot/` | HTTP 402 auto-payment |
| **Wrapped API** | `https://beta-api.paywithlocus.com/api/wrapped/diffbot/` | `Authorization: Bearer <LOCUS_API_KEY>` |

**OpenAPI discovery:** `GET https://diffbot.mpp.paywithlocus.com/openapi.json`

## Endpoints

### Article

Extract clean article text, author, date, images, tags, and sentiment from news/blog pages.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the article page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |
| `paging` | boolean | No | Concatenate multi-page articles |
| `maxTags` | number | No | Max tags to return (default 10) |
| `naturalLanguage` | string | No | NLP features to apply (e.g. entities, sentiment, summary, facts, categories) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/article \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>","paging":"<boolean>","maxTags":"<number>","naturalLanguage":"<string>"}'
```

### Product

Extract product data from e-commerce pages — price, specs, availability, SKU, reviews.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the product page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/product \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### Discussion

Extract threaded comments, reviews, and forum posts with authors, dates, and sentiment.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the discussion page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |
| `paging` | boolean | No | Follow pagination links |
| `maxPages` | number | No | Max pages to follow (default 20) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/discussion \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>","paging":"<boolean>","maxPages":"<number>"}'
```

### Image

Extract image details from pages — URLs, dimensions, titles, alt text.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the image page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/image \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### Video

Extract video metadata, thumbnails, direct URLs, and embed codes.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the video page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/video \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### Analyze

Auto-detect page type (article, product, image, video, etc.) and extract accordingly.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL to analyze |
| `mode` | string | No | Restrict to a specific type: article, product, discussion, image, video, list, event |
| `fallback` | string | No | Fallback API if classification fails |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/analyze \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","mode":"<string>","fallback":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### Event

Extract event details — dates, locations, descriptions from event pages.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the event page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/event \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### List

Extract structured items from listing pages — product listings, search results, news indexes.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the listing page |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/list \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```

### Job Posting

Extract job posting details — title, employer, location, skills, requirements.

**Estimated cost:** $0.004

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `url` | string | Yes | URL of the job posting |
| `fields` | string | No | Comma-separated extra fields to return (e.g. links, meta, breadcrumb) |
| `timeout` | number | No | Request timeout in ms (default 30000) |
| `discussion` | boolean | No | Extract comments/discussion threads (default true for article) |

```bash
curl -X POST https://diffbot.mpp.paywithlocus.com/diffbot/job \
  -H "Content-Type: application/json" \
  -d '{"url":"<string>","fields":"<string>","timeout":"<number>","discussion":"<boolean>"}'
```
