📊 PAJAIS Research Intelligence Agent
Academic Topic Modeling & Gap Analysis for Information Systems Research
Pacific Asia Journal of the Association for Information Systems (PAJAIS)
Step 1: Upload Your Journal CSV
Upload a CSV file containing PAJAIS publications. The system detects title, abstract, year, authors, and DOI columns automatically.
Upload a CSV to see dataset statistics.
Data Preview (first 10 rows)
Phase 2: Extracted Topics
Run topic modeling or use the full pipeline from Tab 1.
Topic Review Table (≥98 rows guaranteed)
Click the button above to compute topic co-occurrences.
Phase 5: Research Gap Analysis
Run mapping or use the full pipeline from Tab 1.
🔵 MAPPED Themes
Mapped Topics
🔴 NOVEL Themes
Novel Topics
💡 Generate Publication Pitch
Select a NOVEL theme label and click below to generate a structured abstract pitch.
Phase 4: Abstract vs Title Theme Comparison
📄 Abstract-Derived Themes
Abstract Topics
🏷 Title-Derived Themes
Title Topics
Phase 6: Generate Academic Narrative Draft
📚 Generate Sample APA Citation
Research Intelligence Dashboard
Dashboard populates automatically after pipeline completion.
-- Total Topics
-- Novel Themes
-- PAJAIS Coverage
-- Publishable Gaps
Phase 2.5: Semantic Clustering via SPECTER2 → UMAP → HDBSCAN
Each paper is represented by one 768-dim SPECTER2 vector computed from its combined Title + Abstract column (DOI-keyed). UMAP reduces dimensions (cosine metric, 50D), then HDBSCAN clusters with an automatic parameter sweep to land in the 15–30 cluster target range. Clusters with fewer than 5 or more than 100 papers are automatically merged/split. Intra-cluster cosine similarity is kept in the 0.50–0.60 band. The 3 most representative paper titles per cluster are sent to Mistral + Gemini + HuggingFace (all free) for labeling — majority vote wins.
Run DBSCAN or use the full pipeline from Tab 1.
📊 Cluster Summary
Clusters (sorted by size)
📄 Document-Level Assignments
Per-Document Cluster Assignments
Phase 6.5: Dual-Model Research Council
Three AI models independently assess the PAJAIS research gap findings:
- Mistral (Panel A) — pragmatic applied IS perspective
- Gemini (Panel B) — broad technology futures perspective
- Ollama (Panel C) — deep analytical synthesis
API keys are loaded automatically from HuggingFace Secrets (MISTRAL_API_KEY, GEMINI_API_KEY). Configure them in your Space under Settings → Variables and Secrets.
🔑 Secret Status (loaded at startup):
MISTRAL_API_KEY: ✅ presentGEMINI_API_KEY: ✅ presentOLLAMA_URL: ✅ present
Run taxonomy mapping first (Tab 3 or full pipeline).
🟢 Panel A — Mistral
🔵 Panel B — Gemini
🟣 Panel C — Ollama
Export Center & Methodology Notes
📖 Methodology Notes
LDA Topic Modeling
This system uses Latent Dirichlet Allocation (LDA) implemented via the
Gensim library. LDA is a generative
probabilistic model that discovers latent thematic structures in a text
corpus by modeling each document as a mixture of topics and each topic as
a distribution over words. The pipeline includes bigram phrase detection,
TF-IDF filtering, and UMass coherence scoring to ensure topic quality.
### PAJAIS Taxonomy (20 Themes)
The 20 canonical PAJAIS themes span IS Strategy, Digital Transformation,
IT Adoption, Knowledge Management, E-Commerce, AI/ML, Blockchain,
Healthcare IS, Social Media, Big Data, Cloud Computing, Cybersecurity,
IS in Asia-Pacific, Mobile Computing, IS Research Methods, Organizational IS,
HCI, IS Education, Sustainability, and FinTech.
### Coherence Scoring & Publishability
Topic coherence is measured using the UMass metric, which captures semantic
relatedness among top topic words. A topic is deemed publishable when
it meets two thresholds: doc_count > 5 (sufficient scholarly attention)
and coherence > 0.30 (semantic stability).
### Abstract vs Title Methodology
Separate LDA models are trained on article abstracts and titles independently.
Topics appearing exclusively in abstracts represent latent constructs —
ideas actively studied but not yet positioned as headline contributions.
Topics exclusive to titles signal positioning keywords favored by authors
as first-impression signals to reviewers and readers.
### DBSCAN Semantic Clustering
Papers are embedded using TF-IDF → Truncated SVD (LSA) for both title and
abstract text independently. DBSCAN is applied to each embedding space with
configurable ε and min_samples parameters. Cluster assignments are merged
via a weighted vote (configurable abstract weight). Large clusters are
recursively bisected; tiny clusters with fewer than min_membership documents
are reassigned to their nearest valid cluster or marked as noise.
### Agentic Research Council
The council convenes two independent AI models (Mistral and Gemini)
to assess the gap analysis findings from complementary epistemological
perspectives. Each panel member produces a structured assessment of the
most publishable gaps, methodological recommendations, and regional focus.
Their independent outputs can be compared side-by-side to identify
consensus positions and productive disagreements.
Built for PAJAIS Research Intelligence