AWAKENResearch Project

AI-Assisted Research Project

The Semantic
Compiler

What if language has formal semantic structure — and we can compute it?

An exploration at the intersection of philosophy, linguistics, and artificial intelligence — treating natural language as multiplexed semantic signals requiring formal compilation.

400 Semantic Atoms|21 Ontology Types|Work in Progress
Scroll
The Question

Against the Distributional Consensus

Modern NLP assumes meaning is captured by statistical patterns. But what if language encodes something deeper — a formal semantic structure that can be computed?

The Core Thesis

Natural language is not noise to be statistically approximated — it's a structured semantic signal that can be formally compiled.

  • 1Natural language is not arbitrary noise — it encodes formal semantic structure
  • 2All meanings decompose into a finite set of universal semantic primitives (~200-400 atoms)
  • 3Word sense disambiguation is not classification — it's structure satisfaction
  • 4The 'Language of Thought' hypothesis can be computationally instantiated

A computational instantiation of the Language of Thought.

Conventional Wisdom

The dominant paradigm in NLP since 2013 has been purely distributional — meaning is what embeddings capture:

  • Language is too messy for formal treatment
  • Meaning is purely statistical (distributional semantics)
  • Neural networks 'just work' — we don't need explicit structure
  • Semantic primitives are a philosophical curiosity, not a computational tool
  • Word senses are arbitrary dictionary entries

But LLMs fail compositional generalization tests.

Background

The Symbolic Systems Perspective

This project emerges from a background in philosophy and symbolic systems — the interdisciplinary study of how minds, machines, and languages represent and process information.

Philosophy

Fodor's Language of Thought, Wierzbicka's semantic primes, compositionality in meaning.

Linguistics

Cognitive linguistics, conceptual metaphor theory, selectional restrictions.

Computer Science

Neuro-symbolic integration, knowledge graphs, energy-based models.

The question driving this work: Can we bridge the gap between formal semantics and neural learning? Can we build systems that understand meaning — not just patterns?

The Approach

Four Foundational Insights

Each insight builds on decades of research in linguistics and cognitive science, now made computational.

Foundation

Semantic Atoms Are Real

Anna Wierzbicka's Natural Semantic Metalanguage (NSM) identified 65 semantic primes validated across 120+ languages. We extend this to ~400 computational atoms.

Architecture

Meaning Is Compositional

Complex meanings emerge from the systematic combination of atomic primitives — just as complex programs emerge from simple operations.

Method

WSD as Energy Minimization

Word sense disambiguation is reframed as finding the lowest-energy configuration of semantic atoms across a sentence — not classification.

Integration

Neuro-Symbolic Fusion

Neither pure neural networks nor pure symbolic systems suffice. We integrate explicit semantic structure with learned representations.

The Vocabulary

400 Semantic Atoms

Derived from Wierzbicka's 65 semantic primes and Talmy's conceptual primitives, extended for computational coverage.

Spatial35

CONTAINMENT, PATH, VERTICALITY, PROXIMITY

Force Dynamics28

CAUSATION, RESISTANCE, ENABLEMENT, MOTION

Temporal22

SEQUENCE, DURATION, CYCLE, CHANGE

Cognitive31

ATTENTION, PERCEPTION, MEMORY, INTENTION

Evaluative18

GOOD, BAD, IMPORTANCE, PREFERENCE

Quantitative24

ONE, MANY, ALL, SOME, DEGREE

Relational29

IDENTITY, SIMILARITY, POSSESSION, PART-WHOLE

Agentive15

AGENT, PATIENT, EXPERIENCER, INSTRUMENT

These atoms are not arbitrary labels — they are grounded in 50+ years of cross-linguistic semantic research. Wierzbicka's Natural Semantic Metalanguage has been validated across 120+ languages from every language family.

The Architecture

8-Pass Semantic Compilation

Like a compiler transforming source code, the semantic compiler transforms natural language through progressive refinement.

1Tokenization

Parse text into tokens with POS tags

2Sense Selection

Neural WSD with atom-aware scoring

3Atom Extraction

Decompose senses into semantic primitives

4Ontology Assignment

Assign each token to its ontological type

5Metaphor Detection

Identify metaphorical mappings

6Type Checking

Verify selectional restrictions

7Graph Construction

Build semantic dependency graph

8Coherence Scoring

Compute global semantic coherence

Honest Assessment

Current Status

This is a research project in active development. We believe in honest assessment of capabilities.

Semantic AtomsDone

400

Ontology TypesDone

21

Sense InventoryDone

25K words

Knowledge GraphDone

Integrated

WSD AccuracyWIP

In Progress

Formal IRPlanned

Planned

What works: The infrastructure is solid — sense inventory, knowledge graph, 8-pass pipeline, atom decomposition. What's in progress: Neural WSD accuracy (currently below MFS baseline — we're actively debugging architecture and data). What's next: Formal intermediate representation, benchmark integration, compositional evaluation.

The Process

Built With AI

This project is developed through deep collaboration with Claude (Anthropic's AI). Every architectural decision, bug fix, and insight emerges from extended dialogue between human intuition and AI capability.

What this collaboration teaches:

  • AI can identify architectural bugs humans miss (atoms not passed through energy computation)
  • Human vision guides direction; AI handles technical implementation
  • The collaboration itself is a meta-experiment in human-AI research partnership

This page, the research summary, and the debugging of the semantic compiler were all produced through this collaboration. The boundary between “human work” and “AI work” is intentionally blurred.

Key References

1996

Wierzbicka, A.

Semantics: Primes and Universals

1975

Fodor, J.A.

The Language of Thought

2011

Goddard, C.

Semantic Analysis (2nd ed.)

2009

Navigli, R.

Word Sense Disambiguation: A Survey

2021

Bevilacqua, M. et al.

Recent Trends in Word Sense Disambiguation

1980

Lakoff, G. & Johnson, M.

Metaphors We Live By

2000

Talmy, L.

Toward a Cognitive Semantics

2020

Marcus, G.

The Next Decade in AI: Four Steps Towards Robust AI

The Question Remains Open

Can natural language be formally compiled into semantic representations?

We don't know yet. But the only way to find out is to build the compiler and test it against reality.

“I believe at the heart of all language and thought and scientific fact is a vast and correct structure of reality that, when laid bare, will render our powerful languages into a reasoning structure that can perfectly validate any theoretical suggestion against this core shadow world of facts.”

AWAKEN Research Program|Created through human-AI collaboration