4.9 KiB
4.9 KiB
Project Overview
Perplexica is an open-source AI-powered search engine that uses advanced machine learning to provide intelligent search results. It combines web search capabilities with LLM-based processing to understand and answer user questions, similar to Perplexity AI but fully open source.
Architecture
The system works through these main steps:
- User submits a query
- The system determines if web search is needed
- If needed, it searches the web using SearXNG
- Results are ranked using embedding-based similarity search
- LLMs are used to generate a comprehensive response with cited sources
Architecture Details
Technology Stack
- Frontend: React, Next.js, Tailwind CSS
- Backend: Node.js
- Database: SQLite with Drizzle ORM
- AI/ML: LangChain + LangGraph for orchestration
- Search: SearXNG integration
- Content Processing: Mozilla Readability, Cheerio, Playwright
Database (SQLite + Drizzle ORM)
- Schema:
src/lib/db/schema.ts - Tables:
messages,chats,systemPrompts - Configuration:
drizzle.config.ts - Local file:
data/db.sqlite
AI/ML Stack
- LLM Providers: OpenAI, Anthropic, Groq, Ollama, Gemini, DeepSeek, LM Studio
- Embeddings: Xenova Transformers, similarity search (cosine/dot product)
- Agents:
webSearchAgent,analyzerAgent,synthesizerAgent,taskManagerAgent
External Services
- Search Engine: SearXNG integration (
src/lib/searxng.ts) - Configuration: TOML-based config file
Data Flow
- User query → Task Manager Agent
- Web Search Agent → SearXNG → Content extraction
- Analyzer Agent → Content processing + embedding
- Synthesizer Agent → LLM response generation
- Response with cited sources
Project Structure
/src/app: Next.js app directory with page components and API routes/src/app/api: API endpoints for search and LLM interactions
/src/components: Reusable UI components/src/lib: Backend functionalitylib/search: Search functionality and meta search agentlib/db: Database schema and operationslib/providers: LLM and embedding model integrationslib/prompts: Prompt templates for LLMslib/chains: LangChain chains for various operationslib/agents: LangGraph agents for advanced processinglib/utils: Utility functions and types including web content retrieval and processing
Focus Modes
Perplexica supports multiple specialized search modes:
- All Mode: General web search
- Local Research Mode: Research and interact with local files with citations
- Chat Mode: Have a creative conversation
- Academic Search Mode: For academic research
- YouTube Search Mode: For video content
- Wolfram Alpha Search Mode: For calculations and data analysis
- Reddit Search Mode: For community discussions
Core Commands
- Development:
npm run dev(uses Turbopack for faster builds) - Build:
npm run build(includes automatic DB push) - Production:
npm run start - Linting:
npm run lint(Next.js ESLint) - Formatting:
npm run format:write(Prettier) - Database:
npm run db:push(Drizzle migrations)
Configuration
The application uses a config.toml file (created from sample.config.toml) for configuration, including:
- API keys for various LLM providers
- Database settings
- Search engine configuration
- Similarity measure settings
Common Tasks
When working on this codebase, you might need to:
- Add new API endpoints in
/src/app/api - Modify UI components in
/src/components - Extend search functionality in
/src/lib/search - Add new LLM providers in
/src/lib/providers - Update database schema in
/src/lib/db/schema.ts - Create new prompt templates in
/src/lib/prompts - Build new chains in
/src/lib/chains - Implement new LangGraph agents in
/src/lib/agents
AI Behavior Guidelines
- Focus on factual, technical responses without unnecessary pleasantries
- Avoid conciliatory language and apologies
- Ask for clarification when requirements are unclear
- Do not add dependencies unless explicitly requested
- Only make changes relevant to the specific task
- Do not create test files or run the application unless requested
- Prioritize existing patterns and architectural decisions
- Use the established component structure and styling patterns
Code Style & Standards
TypeScript Configuration
- Strict mode enabled
- ES2017 target
- Path aliases:
@/*→src/* - No test files (testing not implemented)
Formatting & Linting
- ESLint: Next.js core web vitals rules
- Prettier: Use
npm run format:writebefore commits - Import style: Use
@/prefix for internal imports
File Organization
- Components: React functional components with TypeScript
- API routes: Next.js App Router (
src/app/api/) - Utilities: Grouped by domain (
src/lib/) - Naming: camelCase for functions/variables, PascalCase for components
Error Handling
- Use try/catch blocks for async operations
- Return structured error responses from API routes