AI-Assisted Research Project
The Semantic
Compiler
What if language has formal semantic structure — and we can compute it?
An exploration at the intersection of philosophy, linguistics, and artificial intelligence — treating natural language as multiplexed semantic signals requiring formal compilation.
Against the Distributional Consensus
Modern NLP assumes meaning is captured by statistical patterns. But what if language encodes something deeper — a formal semantic structure that can be computed?
The Core Thesis
Natural language is not noise to be statistically approximated — it's a structured semantic signal that can be formally compiled.
- 1Natural language is not arbitrary noise — it encodes formal semantic structure
- 2All meanings decompose into a finite set of universal semantic primitives (~200-400 atoms)
- 3Word sense disambiguation is not classification — it's structure satisfaction
- 4The 'Language of Thought' hypothesis can be computationally instantiated
A computational instantiation of the Language of Thought.
Conventional Wisdom
The dominant paradigm in NLP since 2013 has been purely distributional — meaning is what embeddings capture:
- Language is too messy for formal treatment
- Meaning is purely statistical (distributional semantics)
- Neural networks 'just work' — we don't need explicit structure
- Semantic primitives are a philosophical curiosity, not a computational tool
- Word senses are arbitrary dictionary entries
But LLMs fail compositional generalization tests.
The Symbolic Systems Perspective
This project emerges from a background in philosophy and symbolic systems — the interdisciplinary study of how minds, machines, and languages represent and process information.
Philosophy
Fodor's Language of Thought, Wierzbicka's semantic primes, compositionality in meaning.
Linguistics
Cognitive linguistics, conceptual metaphor theory, selectional restrictions.
Computer Science
Neuro-symbolic integration, knowledge graphs, energy-based models.
The question driving this work: Can we bridge the gap between formal semantics and neural learning? Can we build systems that understand meaning — not just patterns?
Four Foundational Insights
Each insight builds on decades of research in linguistics and cognitive science, now made computational.
Semantic Atoms Are Real
Anna Wierzbicka's Natural Semantic Metalanguage (NSM) identified 65 semantic primes validated across 120+ languages. We extend this to ~400 computational atoms.
Meaning Is Compositional
Complex meanings emerge from the systematic combination of atomic primitives — just as complex programs emerge from simple operations.
WSD as Energy Minimization
Word sense disambiguation is reframed as finding the lowest-energy configuration of semantic atoms across a sentence — not classification.
Neuro-Symbolic Fusion
Neither pure neural networks nor pure symbolic systems suffice. We integrate explicit semantic structure with learned representations.
400 Semantic Atoms
Derived from Wierzbicka's 65 semantic primes and Talmy's conceptual primitives, extended for computational coverage.
| Category | Example Atoms | Count |
|---|---|---|
| Spatial | CONTAINMENT, PATH, VERTICALITY, PROXIMITY | 35 |
| Force Dynamics | CAUSATION, RESISTANCE, ENABLEMENT, MOTION | 28 |
| Temporal | SEQUENCE, DURATION, CYCLE, CHANGE | 22 |
| Cognitive | ATTENTION, PERCEPTION, MEMORY, INTENTION | 31 |
| Evaluative | GOOD, BAD, IMPORTANCE, PREFERENCE | 18 |
| Quantitative | ONE, MANY, ALL, SOME, DEGREE | 24 |
| Relational | IDENTITY, SIMILARITY, POSSESSION, PART-WHOLE | 29 |
| Agentive | AGENT, PATIENT, EXPERIENCER, INSTRUMENT | 15 |
CONTAINMENT, PATH, VERTICALITY, PROXIMITY
CAUSATION, RESISTANCE, ENABLEMENT, MOTION
SEQUENCE, DURATION, CYCLE, CHANGE
ATTENTION, PERCEPTION, MEMORY, INTENTION
GOOD, BAD, IMPORTANCE, PREFERENCE
ONE, MANY, ALL, SOME, DEGREE
IDENTITY, SIMILARITY, POSSESSION, PART-WHOLE
AGENT, PATIENT, EXPERIENCER, INSTRUMENT
These atoms are not arbitrary labels — they are grounded in 50+ years of cross-linguistic semantic research. Wierzbicka's Natural Semantic Metalanguage has been validated across 120+ languages from every language family.
8-Pass Semantic Compilation
Like a compiler transforming source code, the semantic compiler transforms natural language through progressive refinement.
Parse text into tokens with POS tags
Neural WSD with atom-aware scoring
Decompose senses into semantic primitives
Assign each token to its ontological type
Identify metaphorical mappings
Verify selectional restrictions
Build semantic dependency graph
Compute global semantic coherence
Current Status
This is a research project in active development. We believe in honest assessment of capabilities.
400
21
25K words
Integrated
In Progress
Planned
What works: The infrastructure is solid — sense inventory, knowledge graph, 8-pass pipeline, atom decomposition. What's in progress: Neural WSD accuracy (currently below MFS baseline — we're actively debugging architecture and data). What's next: Formal intermediate representation, benchmark integration, compositional evaluation.
Built With AI
This project is developed through deep collaboration with Claude (Anthropic's AI). Every architectural decision, bug fix, and insight emerges from extended dialogue between human intuition and AI capability.
What this collaboration teaches:
- AI can identify architectural bugs humans miss (atoms not passed through energy computation)
- Human vision guides direction; AI handles technical implementation
- The collaboration itself is a meta-experiment in human-AI research partnership
This page, the research summary, and the debugging of the semantic compiler were all produced through this collaboration. The boundary between “human work” and “AI work” is intentionally blurred.
Key References
Wierzbicka, A.
Semantics: Primes and Universals
Fodor, J.A.
The Language of Thought
Goddard, C.
Semantic Analysis (2nd ed.)
Navigli, R.
Word Sense Disambiguation: A Survey
Bevilacqua, M. et al.
Recent Trends in Word Sense Disambiguation
Lakoff, G. & Johnson, M.
Metaphors We Live By
Talmy, L.
Toward a Cognitive Semantics
Marcus, G.
The Next Decade in AI: Four Steps Towards Robust AI
“I believe at the heart of all language and thought and scientific fact is a vast and correct structure of reality that, when laid bare, will render our powerful languages into a reasoning structure that can perfectly validate any theoretical suggestion against this core shadow world of facts.”