- Updated FileSearchAgent to improve code readability and formatting. - Refactored SynthesizerAgent for better prompt handling and document processing. - Enhanced TaskManagerAgent with clearer file context handling. - Modified AgentSearch to maintain consistent parameter formatting. - Introduced SpeedSearchAgent for optimized search functionality. - Updated metaSearchAgent to support new SpeedSearchAgent. - Improved file processing utilities for better document handling. - Added test attachments for sporting events queries.
145 lines
4.9 KiB
Markdown
145 lines
4.9 KiB
Markdown
# Project Overview
|
|
|
|
Perplexica is an open-source AI-powered search engine that uses advanced machine learning to provide intelligent search results. It combines web search capabilities with LLM-based processing to understand and answer user questions, similar to Perplexity AI but fully open source.
|
|
|
|
## Architecture
|
|
|
|
The system works through these main steps:
|
|
|
|
- User submits a query
|
|
- The system determines if web search is needed
|
|
- If needed, it searches the web using SearXNG
|
|
- Results are ranked using embedding-based similarity search
|
|
- LLMs are used to generate a comprehensive response with cited sources
|
|
|
|
## Architecture Details
|
|
|
|
### Technology Stack
|
|
|
|
- **Frontend**: React, Next.js, Tailwind CSS
|
|
- **Backend**: Node.js
|
|
- **Database**: SQLite with Drizzle ORM
|
|
- **AI/ML**: LangChain + LangGraph for orchestration
|
|
- **Search**: SearXNG integration
|
|
- **Content Processing**: Mozilla Readability, Cheerio, Playwright
|
|
|
|
### Database (SQLite + Drizzle ORM)
|
|
|
|
- Schema: `src/lib/db/schema.ts`
|
|
- Tables: `messages`, `chats`, `systemPrompts`
|
|
- Configuration: `drizzle.config.ts`
|
|
- Local file: `data/db.sqlite`
|
|
|
|
### AI/ML Stack
|
|
|
|
- **LLM Providers**: OpenAI, Anthropic, Groq, Ollama, Gemini, DeepSeek, LM Studio
|
|
- **Embeddings**: Xenova Transformers, similarity search (cosine/dot product)
|
|
- **Agents**: `webSearchAgent`, `analyzerAgent`, `synthesizerAgent`, `taskManagerAgent`
|
|
|
|
### External Services
|
|
|
|
- **Search Engine**: SearXNG integration (`src/lib/searxng.ts`)
|
|
- **Configuration**: TOML-based config file
|
|
|
|
### Data Flow
|
|
|
|
1. User query → Task Manager Agent
|
|
2. Web Search Agent → SearXNG → Content extraction
|
|
3. Analyzer Agent → Content processing + embedding
|
|
4. Synthesizer Agent → LLM response generation
|
|
5. Response with cited sources
|
|
|
|
## Project Structure
|
|
|
|
- `/src/app`: Next.js app directory with page components and API routes
|
|
- `/src/app/api`: API endpoints for search and LLM interactions
|
|
- `/src/components`: Reusable UI components
|
|
- `/src/lib`: Backend functionality
|
|
- `lib/search`: Search functionality and meta search agent
|
|
- `lib/db`: Database schema and operations
|
|
- `lib/providers`: LLM and embedding model integrations
|
|
- `lib/prompts`: Prompt templates for LLMs
|
|
- `lib/chains`: LangChain chains for various operations
|
|
- `lib/agents`: LangGraph agents for advanced processing
|
|
- `lib/utils`: Utility functions and types including web content retrieval and processing
|
|
|
|
## Focus Modes
|
|
|
|
Perplexica supports multiple specialized search modes:
|
|
|
|
- All Mode: General web search
|
|
- Local Research Mode: Research and interact with local files with citations
|
|
- Chat Mode: Have a creative conversation
|
|
- Academic Search Mode: For academic research
|
|
- YouTube Search Mode: For video content
|
|
- Wolfram Alpha Search Mode: For calculations and data analysis
|
|
- Reddit Search Mode: For community discussions
|
|
|
|
## Core Commands
|
|
|
|
- **Development**: `npm run dev` (uses Turbopack for faster builds)
|
|
- **Build**: `npm run build` (includes automatic DB push)
|
|
- **Production**: `npm run start`
|
|
- **Linting**: `npm run lint` (Next.js ESLint)
|
|
- **Formatting**: `npm run format:write` (Prettier)
|
|
- **Database**: `npm run db:push` (Drizzle migrations)
|
|
|
|
## Configuration
|
|
|
|
The application uses a `config.toml` file (created from `sample.config.toml`) for configuration, including:
|
|
|
|
- API keys for various LLM providers
|
|
- Database settings
|
|
- Search engine configuration
|
|
- Similarity measure settings
|
|
|
|
## Common Tasks
|
|
|
|
When working on this codebase, you might need to:
|
|
|
|
- Add new API endpoints in `/src/app/api`
|
|
- Modify UI components in `/src/components`
|
|
- Extend search functionality in `/src/lib/search`
|
|
- Add new LLM providers in `/src/lib/providers`
|
|
- Update database schema in `/src/lib/db/schema.ts`
|
|
- Create new prompt templates in `/src/lib/prompts`
|
|
- Build new chains in `/src/lib/chains`
|
|
- Implement new LangGraph agents in `/src/lib/agents`
|
|
|
|
## AI Behavior Guidelines
|
|
|
|
- Focus on factual, technical responses without unnecessary pleasantries
|
|
- Avoid conciliatory language and apologies
|
|
- Ask for clarification when requirements are unclear
|
|
- Do not add dependencies unless explicitly requested
|
|
- Only make changes relevant to the specific task
|
|
- Do not create test files or run the application unless requested
|
|
- Prioritize existing patterns and architectural decisions
|
|
- Use the established component structure and styling patterns
|
|
|
|
## Code Style & Standards
|
|
|
|
### TypeScript Configuration
|
|
|
|
- Strict mode enabled
|
|
- ES2017 target
|
|
- Path aliases: `@/*` → `src/*`
|
|
- No test files (testing not implemented)
|
|
|
|
### Formatting & Linting
|
|
|
|
- ESLint: Next.js core web vitals rules
|
|
- Prettier: Use `npm run format:write` before commits
|
|
- Import style: Use `@/` prefix for internal imports
|
|
|
|
### File Organization
|
|
|
|
- Components: React functional components with TypeScript
|
|
- API routes: Next.js App Router (`src/app/api/`)
|
|
- Utilities: Grouped by domain (`src/lib/`)
|
|
- Naming: camelCase for functions/variables, PascalCase for components
|
|
|
|
### Error Handling
|
|
|
|
- Use try/catch blocks for async operations
|
|
- Return structured error responses from API routes
|