Google Scholar Scraper 2026: 10 Agent Skills That Replace Your $39/year Tool Stack

Introduction

Author: Daniel · Primary KW: google scholar scraper (KD 4 · SV 250 · GSV 1,200 · CPC $2.00) Target persona: Grad students / postdocs / research analysts pulling literature at 2 AM Funnel stage: Decision Draft v1 · skillsmp + ClawHub edition · 2026-04-28 Sources: 1 BA recipe + 9 skillsmp.com skills (verified via REST API 2026-04-28) --- 🚨 You're paying $39/year for a Google Scholar tool to dodge captchas. You're paying $99/year for citation management. You're stuck with Mendeley's broken sync.

Detail

🏆 1. BrowserAct Scholar Recipe (BA stealth-extract)

👉 https://browseract.com/

What it is: A 30-second recipe, not a packaged skill. BrowserAct's stealth-extract CLI handles Google Scholar's bot detection — including the "Please show you're not a robot" captcha that breaks every other Scholar scraper.

Why it's #1: Scholar's bot detection is the actual hard problem. Once stealth is solved, the rest (parse, dedupe, export) is trivial. Most of the OSS scrapers in this list need you to bring your own proxies + captcha solver. The BA recipe does both transparently.

The recipe:

browser-act stealth-extract \
  "https://scholar.google.com/scholar?q=transformer+attention+mechanism" \
  --fields "title,authors,year,citations,journal,pdf_url" \
  --output papers.json

Same recipe works on scholar.google.com/scholar?cluster=... for citation pulls and ?cites= for forward citations.

2. arxiv (skillsmp · wanshuiyin · ★9,693)

👉 https://skillsmp.com/skills/wanshuiyin-auto-claude-code-research-in-sleep-skills-arxiv-skill-md

What it does: arXiv search, paper download, abstract pull, citation graph traversal — all callable as agent functions.

Why it's on the list: 9,693 stars puts it in the top 0.001% of skillsmp. Author wanshuiyin built it as part of an "auto-research-in-sleep" workflow — the skill is battle-tested by users running overnight literature pulls.

Install:

npx skills add wanshuiyin/auto-claude-code/arxiv

For most STEM fields, arXiv covers 80% of what you'd otherwise pull from Scholar — and it's API-native, no scraping required.

3. scholar-kit (skillsmp · lottshin · ★12)

👉 https://skillsmp.com/skills/lottshin-scholar-kit-skill-md

What it does: All-in-one Scholar workflow — search, parse, extract metadata, format citations.

Why it's on the list: It's the only skillsmp skill named explicitly "scholar-kit" — author treated it as the canonical Scholar entry point. Fewer stars than #2, but more direct mapping to the keyword.

Install:

npx skills add lottshin/scholar-kit

4. xs-arxiv (skillsmp · karaage0703 · ★30)

👉 https://skillsmp.com/skills/karaage0703-ai-assistant-workspace-skills-arxiv-skill-md

What it does: Lightweight arXiv lookup. Less feature-heavy than #2, faster to invoke for one-off queries.

Why it's on the list: When you don't need a citation graph, just "give me the latest 20 papers on X," xs-arxiv is the right tool. Author karaage0703's broader workspace has 30+ research skills — quality bar is high.

Install:

npx skills add karaage0703/ai-assistant-workspace/arxiv

5. scholar-vault-gap-scout (skillsmp · MaxSpur · ★1)

👉 https://skillsmp.com/skills/maxspur-scholar-vault-tools-vault-agent-skills-scholar-vault-gap-scout-skill-md

What it does: Reads your existing literature corpus, identifies research gaps — topics underexplored in your field given recent citation patterns.

Why it's on the list: This is the "why am I doing this PhD" skill. You feed it your zotero library, it tells you which sub-questions have <10 citations and high opportunity. MaxSpur shipped a 6-skill vault series covering compile-paper / orient / labs-prompts / read-pdf — install the whole vault if it clicks.

Install:

npx skills add maxspur/scholar-vault-tools/scholar-vault-gap-scout

6. arxiv-search (skillsmp · fmschulz · ★2)

👉 https://skillsmp.com/skills/fmschulz-omics-skills-skills-arxiv-search-skill-md

What it does: arXiv targeted search with bio/omics-aware filters built in.

Why it's on the list: If your field is computational biology / bioinformatics, fmschulz's omics-skills repo has 15+ sister skills wired for the domain — arxiv-search is the gateway.

Install:

npx skills add fmschulz/omics-skills/arxiv-search

Agent scraper workflow

Run the scrape once with browser-act. Package the repeatable path with Skill Forge.

1. An agent uses browser-act to search Google Maps, scroll listings, inspect place pages, and extract visible fields.
2. The team validates the schema: business name, category, address, phone, website, rating, review count, and source URL.
3. browser-act-skill-forge turns the proven flow into a reusable scraper Skill for future agent runs.

Use browser-act for agents Forge a reusable scraper Skill

7. arxiv-to-html (skillsmp · NTT123 · ★2)

👉 https://skillsmp.com/skills/ntt123-auto-arxiv-to-html-claude-skills-arxiv-to-html-skill-md

What it does: Convert arXiv PDFs into clean reading HTML. Math equations preserved (MathJax), figures inlined, references hyperlinked.

Why it's on the list: "Read 20 papers this week" is impossible if every paper is a fight with a PDF reader. This skill turns the PDF-reading problem into a markdown-reading problem.

Install:

npx skills add ntt123/auto-arxiv-to-html/arxiv-to-html

8. scholar-evaluation (skillsmp · MarieLynneBlock · ★2)

👉 https://skillsmp.com/skills/marielynneblock-arcanum-artifex-skills-scientific-scholar-evaluation-skill-md

What it does: Structured evaluation of a paper — methodology critique, evidence quality scoring, identifying claims-vs.-evidence mismatches.

Why it's on the list: For lit reviews + meta-analyses, you need consistent rubrics across 50+ papers. Doing it by hand kills a week. This skill turns it into a 30-min review.

Install:

npx skills add marielynneblock/arcanum-artifex/scholar-evaluation

9. scholar-vault-compile-paper (skillsmp · MaxSpur · ★1)

👉 https://skillsmp.com/skills/maxspur-scholar-vault-tools-vault-agent-skills-scholar-vault-compile-paper-skill-md

What it does: Drafts a paper outline + literature integration plan from your collected sources. Sister skill to #5.

Why it's on the list: The bridge between "I have 50 papers" and "I have a draft." If you're at the synthesis stage, this is your skill.

Install:

npx skills add maxspur/scholar-vault-tools/scholar-vault-compile-paper

10. arxiv-monitor (skillsmp · julio211916)

👉 https://skillsmp.com/skills/julio211916-tlanticad-studio-v0-1-alpha-skills-ju-skills-arxiv-monitor-skill-md

What it does: Daily arXiv watchlist. Define your topics + authors, the skill emails you new uploads each morning.

Why it's on the list: You need this for a thesis-level project. Manually checking arXiv = lost time. Free Google Scholar Alerts work but bury you in noise; this skill filters by your specific subfield.

Install:

npx skills add julio211916/tlanticad-studio/arxiv-monitor

⚠️ Reality check

You don't need:

❌ A $39/year scholar-helper SaaS — these 10 skills cost $0 to install
❌ A $99/year reference manager subscription — Zotero is free, and skill #9 + a Zotero CLI is enough
❌ A "Pro" plan on any of the AI literature tools — your university library already pays for the underlying databases

You need:

✅ One stealth recipe for Google Scholar (skill #1 — the only path that survives Scholar's captcha)
✅ One arXiv skill (skill #2 — arxiv from wanshuiyin — battle-tested at 9.7K stars)
✅ One synthesis skill (skill #5 or #9 from MaxSpur's vault — for the "what am I writing?" stage)
✅ One monitor skill (skill #10 — so you don't fall behind during writing weeks)
✅ A Claude / Codex agent to glue them together

Annual cost: ~$0 in install fees. ~$10–30/year in BA pay-per-call for Scholar runs.
Replaces: $200+/year scholar-helper / reference-manager / paper-reader stack.

Final thought

The grad students who finish their thesis on schedule in 2026 aren't the ones with the most expensive reference manager.

They're the ones who:

Picked 3 skills covering "find / read / synthesize"
Wired them into one Claude agent
Spent the saved time on actually writing — not fighting their tools

Most students won't do this. They'll keep paying for a Mendeley sub.

That's exactly why this works for the ones who do.

👉 Search 1.4M open-source skills on skillsmp: https://skillsmp.com/
👉 Try BrowserAct stealth-extract: https://browseract.com/

Agent-ready scraping

Two Skills, One Repeatable Browser Workflow

Start with live browser execution when the agent needs to understand a page. Move to Skill Forge when the same scraper should run again without re-exploring the site.

Step 1

Run once with browser-act

Give Codex, Claude Code, Cursor, Windsurf, or another agent a real browser for rendered pages, clicks, scrolling, screenshots, DOM extraction, and network inspection.

Open browser-act Skill

Step 2

Package with Skill Forge

Explore the site once, verify the extraction path, then generate a callable Skill package that other agents can reuse for batch jobs or scheduled workflows.

Open Skill Forge

Discover

Agent opens the target site and learns the working path.

Verify

Fields, pagination, limits, and failure cases are tested.

Reuse

The flow becomes a Skill that future agents can call.

Frequently Asked Questions

Why can't I just use the "free" Google Scholar API?

Because Google doesn't publish one. Every "Google Scholar API" you see is either a third-party scraper (rate-limited, breaks weekly) or paid (SerpAPI's Scholar engine, ~$50/mo). The BA recipe (skill #1) gives you stealth-grade access at pay-per-call rates.

Does my university already have access to Web of Science / Scopus?

Probably yes. Check with your library — institutional access often covers Web of Science, Scopus, ACM Digital Library, IEEE Xplore. If yes, use those for citation graphs and use the skills in this list (skill #1 + #2) for Scholar-only queries (open-access papers + grey literature).

Are these skills compatible with Claude Code, Codex, Cursor?

Yes — every skillsmp entry follows the open SKILL.md format. Install via npx skills add //, the file lands in ~/.claude/skills/, your agent picks it up on next launch.

How do I avoid Scholar's captcha?

Use BA stealth-extract (skill #1). Custom proxies + custom user-agent rotation handle ~80% of cases; BA's stealth handles the remaining 20% (the residential-IP cases). Most OSS Scholar scrapers (Scholarly, scholarly-py, etc.) only handle the first 80%.

What about Semantic Scholar / OpenAlex / CrossRef as alternatives?

Use them where you can. Semantic Scholar's API is free and well-documented; OpenAlex covers ~250M papers; CrossRef has 130M+ DOIs. Use Scholar only when these miss your target — most often for grey literature, theses, and pre-print versions older sources don't index.