Changelog
Last updated: December 23rd, 2023
Allow selecting multiple files and improved file naming
December 23rd, 2023
- Allow selecting multiple files
- Improved file naming when download
Advanced JSON Schema & Pydantic Integration
December 18th, 2023
- Added advanced json_schema capability based on Pydantic models
- Enabled complex data structure scraping with nested objects and arrays
- Support for AI-powered extraction of structured data (e.g., product listings)
- New capability to define and extract multiple related objects (e.g., Product with image, price, description)
Advanced URL Filtering & Performance Updates
December 15th, 2023
- Added exclusion_pattern support for URL filtering (e.g., 'https://foo\.com/posts/[^/]+' to exclude posts pages)
- Implemented excluded_links feature for specific URL exclusions
- Fixed support for scraping from various domains (e.g., google.dev, not just google.com)
- Improved handling of URLs with/without 'www' prefix
Infrastructure & Performance Optimization
December 10th, 2023
- Migrated server infrastructure for better cost efficiency and scalability
- Optimized server to handle more simultaneous requests
- Significantly reduced operational costs from $100/week to a more sustainable level
- Enhanced job status monitoring and real-time updates
Core Functionality Improvements
December 5th, 2023
- Added job_id filtering capability
- Implemented multiple download format options (markdown, AI, HTML)
- Fixed limit and depth logic bugs affecting scraping performance
- Improved job_id immediate retrieval from API
UI/UX and Documentation Updates
December 1st, 2023
- Enhanced documentation for scrape_id and job_id relationship
- Added status enumeration documentation
- Improved playground loading status indicators
- Fixed long link wrapping issues in markdown
- Enhanced login logic and user authentication