Web Scraper / Data Acquisition Engineer (#12855)

Ready to Train a Mind That’s Not (Yet) Human?

Imagine being called in for a secret mission. A confidential client has entrusted us with something ambitious, strange, and thrilling: building a conversational AI that listens like a human, learns like a scholar, and speaks with purpose. This isn’t a product. It’s a presence. And it’s your chance to help shape it—right at the point where it all begins.

We’re assembling a select team of brilliant minds to work on a newly commissioned, stealth-mode project. The brief? Discreet. The data? One of a kind. The potential? Enormous. We can’t reveal the client’s name (yet), but trust us—when the curtains open, you’ll want to say you were there from the start.

If you’re excited by the unknown, passionate about intelligence (artificial and otherwise), and eager to leave your mark on something world-class and whisper-quiet (for now), then read on.

This isn’t just a job. It’s an origin story.

 

Role: Web Scraper / Data Acquisition Engineer

The Data Hunter Who Brings Us Gold

This AI won’t learn from thin air. It will learn from the best data we can find—and that data isn’t handed over. It has to be discovered, extracted, and refined.

That’s where you come in.

As our Web Scraper / Data Acquisition Engineer, you’ll be the digital archaeologist of this mission—responsible for uncovering, crawling, and structuring valuable data from court rulings, legal databases, public knowledge repositories, and other deep corners of the web.

You’re not here to copy and paste.
You’re here to build data pipelines that surface the rare and relevant, reliably.

 

What You’ll Do

– Scrape and structure large volumes of legal rulings, case law, and public datasets from websites and online databases
– Build robust, scalable scraping frameworks with retry logic, rate-limit handling, and anti-bot navigation
– Extract structured content from messy HTML, PDFs, or nested formats (e.g. court verdicts, multi-level documents)
– Ensure scraped content is deduplicated, normalized, and privacy-compliant (especially in GDPR-regulated domains)
– Collaborate with ML engineers to align extracted content with labeling, training, and evaluation pipelines
– Keep a finger on the pulse of open-access legal and regulatory content in Europe and beyond

 

Who You Are

– Highly proficient in Python (Scrapy, BeautifulSoup, Requests, Selenium, etc.)
– Familiar with web crawling best practices, content parsing, and ethical/legal boundaries
– You know how to work with sitemaps, headers, and dynamic loading content (JavaScript rendering, etc.)
– You’ve worked with structured data formats like JSON, XML, and CSV—and made sense of unstructured chaos
– Bonus: experience scraping legal, financial, academic, or multilingual content
– Bonus: You’ve beaten at least one CAPTCHA in your life and lived to tell the tale

 

Our Culture & Core Beliefs

You’re not here to grab data.
You’re here to curate the future.
Because what the system learns—what it knows—depends on your craft.

We believe in:
– Signal over noise
– Structure over spaghetti
– Ethics over shortcuts
– Automation over repetition
– Craftsmanship over code churn.

 

Our Selection Process

You’ll be evaluated on creativity, precision, and stealth-mode technical power:
– Alignment & values interview
– Live scraping challenge (real-world messy site)
– Data transformation test (cleaning + structuring)
– Final call with the founders: vision, scale, and alignment

We’re not hiring extractors.
We’re hiring data tacticians.

 

Location & Compensation

– Based in Almere, The Netherlands (in-office with some flexibility)
– You’ll work side-by-side with ML engineers and legal experts in a high-trust mission zone
– Compensation: strong, above average—because the quality of our AI depends on what you dig up.

 

If you can mine gold from digital stone, make sense of chaos, and feed a machine that will think like a lawyer—apply now. The mind we’re building will be trained on your discoveries.

 

Position Code #12855

Apply for this Position

Please complete the form below. Add the link to your Linked In profile and add your Resume/CV (in DOC, DOCX or PDF format).