Skillful-Alhazen: AI-Powered Knowledge Curation
“The duty of the man who investigates the writings of scientists, if learning the truth is his goal, is to make himself an enemy of all that he reads, and, applying his mind to the core and margins of its content, attack it from every side.”
— Ibn al-Haytham (Alhazen), 965-1039 AD
In his book, Where Good Ideas Come From, Steven Johnson wrote about the commonplace book - a form of personal notebook where thinkers would note down passages, observations, and ideas organized by thematic headings. Johnson described his own use of this idea in his work as an author, and then took it one-stage further: he worked with Google to create NotebookLM, an AI agent that analyzes users’ collections of online resources to provide explanations, analysis, and understanding over their content.
Inspired by that idea, I coded Skillful-Alhazen, an AI research assistant powered by an underlying ‘ontological notebook’. Rather than relying only on AI’s raw ability with natural language to define the semantics of content, the system uses a powerful new knowledge graph technology (TypeDB) that is ideally suited to support AI-agents generally (blog post).
The name honors an eleventh-century scholar named Ibn al-Haytham (latinized as ‘Alhazen’) who discovered how the eye works and wrote the masterful Book of Optics around the years 1011-1021 CE. This work was so foundational that it shaped the understanding of light and vision for six centuries. He was arguably the first authentic scientist who principally derived his worldview from experimentation and observation. He also articulated a philosophy of critical reading. The duty of anyone who investigates the writings of scientists, he argued, is to make themselves an enemy of all they read—to attack it from every side, suspect every conclusion, and distrust even their own judgment.
That adversarial stance toward information provides a useful framing for AI-enabled online research. When we’re inundated with information, Let us interrogate the things we see, hear, and read with caution and a careful mindset. Finding truth is challenging. The goal of this project is to use AI to make that goal easier.
So, why do we call the Alhazen tool ‘skillful’?
Here, we combine Claude’s innovative idea of agent skills (a compartmentalized agentic capability packaged in the most flexible, lightweight way possible) with an ontologically-powered knowledge graph to represent the universe of discourse needed for that capability.
This is a powerful modeling distinction - if each skill represents a well-defined domain where everything we want to say about that domain must be defined explicitly then we face a trade off between semantic complexity and tractability. The more complex the underlying reasoning is, the harder it is to encode and query.
There are advantages to concrete, closed world reasoning: it is deterministic, it is inherently explainable, and it scales well. As a commercial-grade database, TypeDB is state-of-the-art for this functionality with enough logical richness to permit genuine utility.
Modern frontier models (and their coding agents) are powerful enough to be able to translate everyday human instructions into code that execute effectively over these databases. Claude can build and edit schema; it can write code to execute queries; it can write ad-hoc queries on-the-fly; it can catch its own mistakes and correct them; it can format data appropriately. Most impressively, because TypeDB is a declarative formulation that is robust to schema changes without corrupting the entire database, Claude can redesign the TypeDB knowledge graph on-the-fly.
If each skill represents a specific domain, then any domain’s conceptual model can be generated by the agent itself. We can instrument Claude to track its failings and performance gaps in the use of the skill over time to create an agentic loop where it iteratively improves the implementation of the skill over time.
As someone who has worked in the field of scientific knowledge engineering for decades, the combination of LLM + Logic unlocks a real superpower. The intrinsic challenge of developing a rich, accurate, well-designed knowledge representation for any given domain becomes tractable.
Curation Design Patterns - The Importance of Provenance
The project has grown out of work I did previously to help researchers make sense of scientific literature at scale. The key to that work is being able to track the provenance of information. In Alhazen, research work is defined as following the following process:
- Discover - where you find information that is relevant for your domain
- Ingest - where you make a copy of that information in the context of your study
- Sensemaking - where you read the content and interpret it according to some underlying conceptual model
- Analysis - where you synthesize the information you compiled from each source into a whole for some purpose
- Report - where you generate output for your user community.
This relatively simple design pattern is applicable to a wide range of curation tasks and is supported in our case by the following high-level object types (with examples from two domains Job Hunting and Literature Review):
| Alhazen Type | Your Domain | Example (Job Hunt) | Example (Skill Building) |
|---|---|---|---|
| Collection | Groupings of entities of interest | ”High priority roles" | "Biological models” |
| Thing | Primary items you track | Company, Position | Agent Skill |
| Relations | Relationships between things | Links between postions and conversations | Links between schema and design gaps |
| Artifact | Raw captured content | Job Description | Schema File, Script |
| Fragment | Extracted pieces | Requirement | Function |
| Note | Claude’s analysis | Fit Analysis | Skill design gap |
Here, we explicitly define classes of entities that model information artifacts that pertain to the things under study. Papers may describe scientific phenomena; online postings describe job opportunities. Tracking the relationship between your research’s end product about a ‘thing’ requires you to track how you came to derive that end product from existing sources.
Worked Examples
We describe two examples of the system’s use:
The first is job hunting. Not the most glamorous, but a genuinely painful information problem: too many postings, too little signal, and the cognitive burden of tracking everything across spreadsheets and browser tabs while also trying to prepare for interviews and maintain morale. The system ingests job postings from URLs you share with it, extracts the requirements, maps them against your profile, identifies gaps, and tracks where you are in each process. It enables Claude to act as your coach - directly, even extensively if you’d like to build out its capabilities. This YouTube Video provides a walk-through provides a demonstration of the system performing a mock job search for Albert Einstein in the Bay Area in March 2026.
The second example is the meta-skill of domain-modeling / skill-building. This is the key to the entire approach since it provides a framework for developing prompts, TypeDB schema, scripts, dashboards, and documentation for Claude skills based on source documents. This is developed as a curation task based on design specification artifacts (either supplied by a user or inferred by Claude from descriptions). The objects that are generated by this process are the source files of a skill. All the information that informed the design choices in creating that skill are recorded in the TypeDB knowledge graph. If a design gap or error is noted as the skill is being used, a note can be recorded, which then can be retrieved and used as context to modify any of the skill’s components. This virtuous cycle could be used to automatically improve Alhazen Notebook skills’ performance over time.
Ibn al-Haytham produced his greatest work under constraint. There’s something in that worth sitting with. The conditions that force you to be methodical, to slow down, to build rather than browse—they tend to produce something more durable than the frantic pace of open-ended exploration.
That’s what I’m trying to build here: a system that is methodical by design. An enemy of what it reads - in the best sense possible - with the truth as it’s goal.
Repository: github.com/sciknow-io/skillful-alhazen · Forked from chanzuckerberg/alhazen