PAJAIS Research Intelligence Agent

Step 1: Upload Your Journal CSV

Upload a CSV file containing PAJAIS publications. The system detects title, abstract, year, authors, and DOI columns automatically.

Upload Journal CSV

Upload a CSV to see dataset statistics.

Data Preview (first 10 rows)

Data Preview (first 10 rows)

Phase 2: Extracted Topics

Run topic modeling or use the full pipeline from Tab 1.

Topic Review Table (≥98 rows guaranteed)

Topic Review Table (≥98 rows guaranteed)

Phase 5: Research Gap Analysis

Run mapping or use the full pipeline from Tab 1.

🔵 MAPPED Themes

Mapped Topics

Mapped Topics

🔴 NOVEL Themes

Novel Topics

Novel Topics

💡 Generate Publication Pitch

Select a NOVEL theme label and click below to generate a structured abstract pitch.

NOVEL Theme Label

Phase 4: Abstract vs Title Theme Comparison

📄 Abstract-Derived Themes

Abstract Topics

Abstract Topics

🏷 Title-Derived Themes

Title Topics

Title Topics

Phase 6: Generate Academic Narrative Draft

Section 7 Narrative Draft (~500 words)

📚 Generate Sample APA Citation

Research Intelligence Dashboard

Dashboard populates automatically after pipeline completion.

-- Total Topics

-- Novel Themes

-- PAJAIS Coverage

-- Publishable Gaps

Topic Distribution

Mapped vs Novel

Phase 2.5: Semantic Clustering via SPECTER2 → UMAP → HDBSCAN

Each paper is represented by one 768-dim SPECTER2 vector computed from its combined Title + Abstract column (DOI-keyed). UMAP reduces dimensions (cosine metric, 50D), then HDBSCAN clusters with an automatic parameter sweep to land in the 15–30 cluster target range. Clusters with fewer than 5 or more than 100 papers are automatically merged/split. Intra-cluster cosine similarity is kept in the 0.50–0.60 band. The 3 most representative paper titles per cluster are sent to Mistral + Gemini + HuggingFace (all free) for labeling — majority vote wins.

Run DBSCAN or use the full pipeline from Tab 1.

📊 Cluster Summary

Clusters (sorted by size)

Clusters (sorted by size)

📄 Document-Level Assignments

Per-Document Cluster Assignments

Per-Document Cluster Assignments

Cluster Size Distribution

Clustered vs Noise

Phase 6.5: Dual-Model Research Council

Three AI models independently assess the PAJAIS research gap findings:

Mistral (Panel A) — pragmatic applied IS perspective
Gemini (Panel B) — broad technology futures perspective
Ollama (Panel C) — deep analytical synthesis

API keys are loaded automatically from HuggingFace Secrets (MISTRAL_API_KEY, GEMINI_API_KEY). Configure them in your Space under Settings → Variables and Secrets.

🔑 Secret Status (loaded at startup):

MISTRAL_API_KEY: ✅ present
GEMINI_API_KEY: ✅ present
OLLAMA_URL: ✅ present

Run taxonomy mapping first (Tab 3 or full pipeline).

🟢 Panel A — Mistral

Mistral Assessment

🔵 Panel B — Gemini

Gemini Assessment

🟣 Panel C — Ollama

Ollama Assessment

Export Center & Methodology Notes

📖 Methodology Notes

LDA Topic Modeling

This system uses Latent Dirichlet Allocation (LDA) implemented via the Gensim library. LDA is a generative probabilistic model that discovers latent thematic structures in a text corpus by modeling each document as a mixture of topics and each topic as a distribution over words. The pipeline includes bigram phrase detection, TF-IDF filtering, and UMass coherence scoring to ensure topic quality. ### PAJAIS Taxonomy (20 Themes) The 20 canonical PAJAIS themes span IS Strategy, Digital Transformation, IT Adoption, Knowledge Management, E-Commerce, AI/ML, Blockchain, Healthcare IS, Social Media, Big Data, Cloud Computing, Cybersecurity, IS in Asia-Pacific, Mobile Computing, IS Research Methods, Organizational IS, HCI, IS Education, Sustainability, and FinTech. ### Coherence Scoring & Publishability Topic coherence is measured using the UMass metric, which captures semantic relatedness among top topic words. A topic is deemed publishable when it meets two thresholds: `doc_count > 5` (sufficient scholarly attention) and `coherence > 0.30` (semantic stability). ### Abstract vs Title Methodology Separate LDA models are trained on article abstracts and titles independently. Topics appearing exclusively in abstracts represent latent constructs — ideas actively studied but not yet positioned as headline contributions. Topics exclusive to titles signal positioning keywords favored by authors as first-impression signals to reviewers and readers. ### DBSCAN Semantic Clustering Papers are embedded using TF-IDF → Truncated SVD (LSA) for both title and abstract text independently. DBSCAN is applied to each embedding space with configurable ε and min_samples parameters. Cluster assignments are merged via a weighted vote (configurable abstract weight). Large clusters are recursively bisected; tiny clusters with fewer than min_membership documents are reassigned to their nearest valid cluster or marked as noise. ### Agentic Research Council The council convenes two independent AI models (Mistral and Gemini) to assess the gap analysis findings from complementary epistemological perspectives. Each panel member produces a structured assessment of the most publishable gaps, methodological recommendations, and regional focus. Their independent outputs can be compared side-by-side to identify consensus positions and productive disagreements.

Built for PAJAIS Research Intelligence

📊 PAJAIS Research Intelligence Agent

Academic Topic Modeling & Gap Analysis for Information Systems Research

Pacific Asia Journal of the Association for Information Systems (PAJAIS)