Web Scrape Block

Overview

The Web Scrape block extracts content from webpages. Use it to gather information from websites, pull article content, extract product details, or collect data for AI analysis. The block supports multiple output formats including AI-powered structured data extraction.

Configuration

Website URL

Enter the URL of the webpage to scrape. This field supports placeholders to scrape dynamic URLs from previous steps. Examples:

Static URL: https://example.com/blog/article-title
From search results: {{step_1.output.organic[0].link}}
From loop: {{current.url}}

The scraper handles JavaScript-rendered pages, so dynamic content loads correctly.

Result Format

Choose how the scraped content is returned.

Format	Description	Best For
AI JSON Format	AI extracts structured data based on your prompt	Product details, article metadata, specific data points
Markdown Format	Clean, formatted text content	Content analysis, LLM processing, readability
HTML Format	Raw HTML markup	Preserving structure, custom parsing

AI JSON Format

When you select AI JSON Format, the scraper uses AI to extract specific data from the page based on your prompt.

Prompt

Tell the AI what information to extract from the page. Be specific about the data structure you need. Example prompts: For a product page:

Extract the product name, price, description, and list of features.

For a blog article:

Extract the article title, author name, publication date, and main content summary.

For a company page:

Extract the company name, founding year, number of employees, and headquarters location.

JSON Output Example

For a product page with the prompt “Extract product name, price, and features”:

{
  "product_name": "Wireless Bluetooth Headphones",
  "price": "$79.99",
  "features": [
    "40-hour battery life",
    "Active noise cancellation",
    "Foldable design",
    "Built-in microphone"
  ]
}

Accessing JSON data:

{{step_n.output.product_name}}
{{step_n.output.features[0]}}

Markdown Format

Returns the page content as clean, readable markdown text. Navigation, ads, and boilerplate are removed.

Markdown Output Example

# How to Improve Your SEO in 2024

By John Smith | January 15, 2024

Search engine optimization continues to evolve. Here are the key strategies
for improving your rankings this year.

## 1. Focus on User Experience

Google's algorithm increasingly prioritizes pages that provide excellent
user experiences...

## 2. Create Quality Content

Content remains king. Focus on creating comprehensive, valuable content
that answers user questions...

Accessing markdown:

{{step_n.output}}

HTML Format

Returns the raw HTML content of the page. Useful when you need to preserve exact structure or perform custom parsing.

HTML Output Example

<article>
  <h1>How to Improve Your SEO in 2024</h1>
  <div class="author">By John Smith</div>
  <div class="content">
    <p>Search engine optimization continues to evolve...</p>
  </div>
</article>

Content Options

Only Main Content

When enabled, the scraper excludes navigation menus, footers, sidebars, and other peripheral content. Returns only the primary content area.

On: Cleaner output focused on main content
Off: Full page content including navigation and sidebars

Use this when you want article text without site-wide elements.

Include Metadata

When enabled, the output includes page metadata alongside the content. Metadata fields included:

title - Page title
description - Meta description
ogTitle - Open Graph title
ogDescription - Open Graph description
language - Page language
favicon - Favicon URL
sourceURL - Original URL
twitter:title - Twitter card title
twitter:description - Twitter card description

Output with Metadata

When metadata is included, the output structure changes: Markdown with metadata:

{
  "markdown": "# Article Title\n\nArticle content here...",
  "metadata": {
    "title": "Article Title | Site Name",
    "description": "A brief description of the article",
    "ogTitle": "Article Title",
    "language": "en",
    "sourceURL": "https://example.com/article"
  }
}

Accessing content with metadata:

{{step_n.output.markdown}}
{{step_n.output.metadata.title}}
{{step_n.output.metadata.description}}

Best Practices

Use “Only Main Content” for cleaner article extraction
Choose Markdown format when feeding content to LLM blocks
Use AI JSON format when you need specific structured data
Include metadata when you need page titles or descriptions
Combine with Google Search to scrape top-ranking pages
Test scraping on a single URL before running bulk operations

Common Use Cases

Use Case	Configuration Tips
Content research	Markdown format + Only Main Content for clean articles
Competitor analysis	AI JSON to extract specific data points
Price monitoring	AI JSON with prompt for price and product details
Lead generation	AI JSON to extract contact information
SEO analysis	Include Metadata to get title tags and descriptions
Content aggregation	Loop through URLs, scrape each in Markdown

Example Workflow: Competitor Content Analysis

Analyze content from top-ranking pages:

Google Search Block: Search for target keyword
Loop Block: Iterate through top 5 organic results
Web Scrape Block:
- URL: {{current.link}}
- Format: Markdown
- Only Main Content: On
LLM Block: Analyze content themes and structure
Google Sheets Block: Store analysis results

Example Workflow: Product Data Extraction

Extract product details from e-commerce pages:

Google Sheets Block: Read list of product URLs
Loop Block: Process each URL
Web Scrape Block:
- URL: {{current.product_url}}
- Format: AI JSON
- Prompt: “Extract product name, price, rating, number of reviews, and availability status”
Google Sheets Block: Append extracted data

Troubleshooting

Issue	Cause	Solution
Empty output	Page blocks scraping	Try different URL or check robots.txt
Missing content	JavaScript not rendered	Content should load; contact support if persistent
Timeout	Page too slow	Reduce concurrent scrapes, try again later
Prompt required error	Using JSON format without prompt	Add extraction prompt for AI JSON format
Incomplete JSON	Vague prompt	Be more specific about data to extract

What’s Next

Now that you understand the Web Scrape block:

Learn about Google Search Block to find URLs to scrape
See Loop Block to scrape multiple pages
Explore LLM Block to analyze scraped content
Check Liquid Templating for data transformation

Getting Started

AI Search Analytics

Workflow

Sheets

Pages

Keyword Tracker

Integrations

Overview

Configuration

Website URL

Result Format

AI JSON Format

Prompt

JSON Output Example

Markdown Format

Markdown Output Example

HTML Format

HTML Output Example

Content Options

Only Main Content

Include Metadata

Output with Metadata

Best Practices

Common Use Cases

Example Workflow: Competitor Content Analysis

Example Workflow: Product Data Extraction

Troubleshooting

What’s Next

Getting Started

AI Search Analytics

Workflow

Sheets

Pages

Keyword Tracker

Integrations

Documentation Index

​Overview

​Configuration

​Website URL

​Result Format

​AI JSON Format

​Prompt

​JSON Output Example

​Markdown Format

​Markdown Output Example

​HTML Format

​HTML Output Example

​Content Options

​Only Main Content

​Include Metadata

​Output with Metadata

​Best Practices

​Common Use Cases

​Example Workflow: Competitor Content Analysis

​Example Workflow: Product Data Extraction

​Troubleshooting

​What’s Next

Overview

Configuration

Website URL

Result Format

AI JSON Format

Prompt

JSON Output Example

Markdown Format

Markdown Output Example

HTML Format

HTML Output Example

Content Options

Only Main Content

Include Metadata

Output with Metadata

Best Practices

Common Use Cases

Example Workflow: Competitor Content Analysis

Example Workflow: Product Data Extraction

Troubleshooting

What’s Next