Skip to main content
Open-source multi-platform topic intelligence

Trace rising topics from signal tocross-platform evidence.

MindSpider identifies emerging public conversations, expands them into structured crawl queues, and captures platform-level discussion data analysts can inspect, store, and reuse.

Open-source architecturePlaywright-based crawlingMySQL-backed datasets
13
upstream discovery sources

Daily feeds across social, technical, and community surfaces.

7
deep-crawl platforms

Platform-specific passes for posts, comments, and engagement evidence.

2-stage
analysis pipeline

Broad topic extraction first, platform-level sentiment crawling second.

System Shape

Discovery to crawl, without the manual gap

README-backed

Discovery Sources

Daily signal intake

Weibo, Zhihu, Bilibili, Toutiao, GitHub, CoolApk, and adjacent feeds seed the topic graph before deeper crawling begins.

Agent Layer

AI topic extraction

Model-assisted summarization produces topic names, summaries, and keyword lists from noisy daily sources.

Crawl Queue

Keyword fan-out

The extracted topics become crawl tasks for each platform adapter, keeping downstream work tied to explicit evidence.

Platform Pass

Deep sentiment crawling

Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu are crawled with browser automation to capture comments, reactions, and discussion context.

Output

Tables + reports

Data lands in explicit tables like `daily_topics`, `topic_news_relation`, and `crawling_tasks`.

Discovery sources

The broad pass is designed to recognize momentum before you choose a crawl target.

WeiboZhihuBilibiliToutiaoGitHubCoolApk

Deep-crawl targets

The second pass goes deeper on the platforms where sentiment, discussion, and feedback actually live.

XiaohongshuDouyinKuaishouBilibiliWeiboTiebaZhihu

Data outputs

What comes out is designed for operators, not just demos.

daily_newsdaily_topicstopic_news_relationcrawling_tasksplatform content tables

How It Works

A crawler pipeline shaped like an analyst workflow.

01

Discover rising topics

MindSpider pulls daily hot signals from news and community sources, then uses AI extraction to turn raw headlines into reusable topic clusters.

02

Fan out into platform crawls

Those topic clusters become structured keyword queues for deep crawls across Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Tieba, and Zhihu.

03

Persist evidence for analysis

Tasks, content, and relationships are written into MySQL-ready tables so you can review trajectories, compare platforms, and build downstream reports.

Architecture

Three lanes, one intent: make topic movement inspectable.

Daily signal intake

Broad Topic Extraction

The first lane watches public trend surfaces, normalizes source data, and asks the model layer to produce topics worth pursuing.

  • Daily news and hot-list collection
  • AI-generated topic summaries
  • Keyword lists written to durable storage

Platform-specific evidence collection

Deep Sentiment Crawling

The second lane takes the extracted keywords and turns them into structured crawl tasks for each target platform.

  • Per-platform crawler adapters
  • Login-aware browser sessions
  • Comment, post, and interaction capture

Tables, tasks, and replayability

Structured Output Layer

Instead of dumping text into blobs, MindSpider stores topic relations, crawl progress, and platform outputs in explicit database structures.

  • MySQL-oriented persistence
  • Task progress and status tracking
  • Reusable datasets for reports and follow-on agents

Open-Source Status

MindSpider is live as a project identity, and its latest implementation path now runs through BettaFish.

The original MindSpider repository still documents the pipeline clearly. The maintainers now position the latest code inside BettaFish, so this site keeps the original project story and the current upstream path in the same frame.

  • Use this site as the product-facing front door for the project story.
  • Use the GitHub repository and README for the current setup path.
  • Treat the /start route as a repository-first evaluation handoff, not a hosted signup flow.

Upstream repositories

Keep both links visible so operators can read the original README and follow the newer monorepo path without guessing.

Original MindSpider repositoryBettaFish upstream module host

Feature Surface

Product language for a system that still respects the code.

AI topic extraction

Convert noisy daily news and hot lists into themes, summaries, and keyword sets that agents can keep working with.

Playwright-first crawling

Browser automation is built into the deep crawl layer, making dynamic pages and login-heavy flows more realistic to operate.

Platform-aware storage

Outputs are mapped into structured tables for notes, videos, threads, tasks, and topic relationships instead of loose export files.

Keyword queue control

The system manages topic-to-keyword fan-out so follow-up crawls stay tied to the signals that triggered them.

Open-source inspectability

Everything is visible in code: pipeline stages, database schema, platform adapters, and the operational assumptions around them.

Built for analyst handoff

The output is meant to be reviewed, queried, and reused by humans or later agents instead of dying as a one-shot scrape.

Frequently Asked Questions

Is MindSpider a hosted SaaS product today?+

No. This site presents MindSpider as an open-source project and product identity. The /start route is a setup handoff page that points you to the public repositories and README.

What does the two-stage pipeline actually mean?+

Stage one identifies promising topics from daily feeds. Stage two takes the resulting keywords and runs deeper platform crawls to gather sentiment-bearing evidence.

Which technologies shape the implementation?+

The README centers Python, Playwright, MySQL, asyncio, and a DeepSeek-compatible analysis layer for topic extraction and downstream interpretation.

Can I self-host it?+

Yes. The project is presented as an inspectable open-source system, so the primary path today is repository-driven setup rather than a closed hosted dashboard.

Why mention BettaFish here?+

Because the upstream README now states that the latest MindSpider code is maintained as a submodule inside BettaFish. Linking both avoids sending users to stale expectations.

Setup Path

Review the current setup path, then inspect the system at source level.

Today that means a repository-first handoff: the setup page, the public README, and the upstream repositories. The site is meant to clarify the system before you decide to run it.