Documentation Index
Fetch the complete documentation index at: https://slatehq.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Web Scrape block extracts content from webpages. Use it to gather information from websites, pull article content, extract product details, or collect data for AI analysis. The block supports multiple output formats including AI-powered structured data extraction.Configuration
Website URL
Enter the URL of the webpage to scrape. This field supports placeholders to scrape dynamic URLs from previous steps. Examples:- Static URL:
https://example.com/blog/article-title - From search results:
{{step_1.output.organic[0].link}} - From loop:
{{current.url}}
The scraper handles JavaScript-rendered pages, so dynamic content loads correctly.
Result Format
Choose how the scraped content is returned.| Format | Description | Best For |
|---|---|---|
| AI JSON Format | AI extracts structured data based on your prompt | Product details, article metadata, specific data points |
| Markdown Format | Clean, formatted text content | Content analysis, LLM processing, readability |
| HTML Format | Raw HTML markup | Preserving structure, custom parsing |
AI JSON Format
When you select AI JSON Format, the scraper uses AI to extract specific data from the page based on your prompt.Prompt
Tell the AI what information to extract from the page. Be specific about the data structure you need. Example prompts: For a product page:JSON Output Example
For a product page with the prompt “Extract product name, price, and features”:Markdown Format
Returns the page content as clean, readable markdown text. Navigation, ads, and boilerplate are removed.Markdown Output Example
HTML Format
Returns the raw HTML content of the page. Useful when you need to preserve exact structure or perform custom parsing.HTML Output Example
Content Options
Only Main Content
When enabled, the scraper excludes navigation menus, footers, sidebars, and other peripheral content. Returns only the primary content area.- On: Cleaner output focused on main content
- Off: Full page content including navigation and sidebars
Include Metadata
When enabled, the output includes page metadata alongside the content. Metadata fields included:title- Page titledescription- Meta descriptionogTitle- Open Graph titleogDescription- Open Graph descriptionlanguage- Page languagefavicon- Favicon URLsourceURL- Original URLtwitter:title- Twitter card titletwitter:description- Twitter card description
Output with Metadata
When metadata is included, the output structure changes: Markdown with metadata:Best Practices
- Use “Only Main Content” for cleaner article extraction
- Choose Markdown format when feeding content to LLM blocks
- Use AI JSON format when you need specific structured data
- Include metadata when you need page titles or descriptions
- Combine with Google Search to scrape top-ranking pages
- Test scraping on a single URL before running bulk operations
Common Use Cases
| Use Case | Configuration Tips |
|---|---|
| Content research | Markdown format + Only Main Content for clean articles |
| Competitor analysis | AI JSON to extract specific data points |
| Price monitoring | AI JSON with prompt for price and product details |
| Lead generation | AI JSON to extract contact information |
| SEO analysis | Include Metadata to get title tags and descriptions |
| Content aggregation | Loop through URLs, scrape each in Markdown |
Example Workflow: Competitor Content Analysis
Analyze content from top-ranking pages:- Google Search Block: Search for target keyword
- Loop Block: Iterate through top 5 organic results
- Web Scrape Block:
- URL:
{{current.link}} - Format: Markdown
- Only Main Content: On
- URL:
- LLM Block: Analyze content themes and structure
- Google Sheets Block: Store analysis results
Example Workflow: Product Data Extraction
Extract product details from e-commerce pages:- Google Sheets Block: Read list of product URLs
- Loop Block: Process each URL
- Web Scrape Block:
- URL:
{{current.product_url}} - Format: AI JSON
- Prompt: “Extract product name, price, rating, number of reviews, and availability status”
- URL:
- Google Sheets Block: Append extracted data
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Empty output | Page blocks scraping | Try different URL or check robots.txt |
| Missing content | JavaScript not rendered | Content should load; contact support if persistent |
| Timeout | Page too slow | Reduce concurrent scrapes, try again later |
| Prompt required error | Using JSON format without prompt | Add extraction prompt for AI JSON format |
| Incomplete JSON | Vague prompt | Be more specific about data to extract |
What’s Next
Now that you understand the Web Scrape block:- Learn about Google Search Block to find URLs to scrape
- See Loop Block to scrape multiple pages
- Explore LLM Block to analyze scraped content
- Check Liquid Templating for data transformation