Perplexica/POSTGRESQL_INTEGRATION.md
钟山 5bc1f1299e feat: 接入PostgreSQL数据库实现数据持久化
- 将news/batch API从内存存储改为PostgreSQL
- 添加企业实体识别功能(Lagos-inspired)
- 创建三个数据表:news_articles, risk_analyses, entity_mentions
- 实现分页和过滤功能
- 支持在新闻中搜索企业实体
- 添加完整的测试脚本和文档

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-07 23:12:16 +08:00

4.6 KiB

PostgreSQL Integration Summary

Completed Tasks (截止 19:00)

1. Database Schema Created

  • Location: src/lib/db/postgres-schema.ts
  • Tables:
    • news_articles - Stores news from crawlers
    • risk_analyses - Stores risk analysis results
    • entity_mentions - Tracks entities found in news

2. Database Connection Configuration

  • Location: src/lib/db/postgres.ts
  • Features:
    • Connection pooling
    • Auto table initialization
    • Connection testing
    • Index creation for performance

3. News API Updated (/api/news/batch)

  • Changes:
    • Switched from memory to PostgreSQL storage
    • Added pagination support (limit/offset)
    • Persistent data storage
    • Filter by source and category
    • Auto-creates tables on first run
  • New Features:
    • Entity recognition (Lagos-inspired prompts)
    • Search entities in news database
    • Store analyses in PostgreSQL
    • Track entity mentions
    • Sentiment analysis (simplified)

🔧 Setup Instructions

1. Install Dependencies

npm install pg @types/pg drizzle-orm

2. Configure Database

# Create .env file
DATABASE_URL=postgresql://user:password@localhost:5432/perplexica

3. Start PostgreSQL

# macOS
brew services start postgresql@15

# Linux
sudo systemctl start postgresql

4. Create Database

createdb perplexica

📊 API Usage Examples

News Batch API

# POST news articles
curl -X POST http://localhost:3000/api/news/batch \
  -H "Content-Type: application/json" \
  -d '{
    "source": "crawler_1",
    "articles": [{
      "title": "Breaking News",
      "content": "Article content...",
      "category": "Technology"
    }]
  }'

# GET with pagination
curl "http://localhost:3000/api/news/batch?limit=10&offset=0"

Risk Analysis API with Entity Recognition

# Analyze with entity search
curl -X POST http://localhost:3000/api/legal-risk/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "companyName": "TestCorp",
    "industry": "Financial Services",
    "searchNews": true,
    "dataPoints": {
      "employees": 25,
      "yearFounded": 2023
    }
  }'

🎯 Entity Recognition Features

Pattern-Based Recognition

Recognizes:

  • Companies: Apple Inc., Microsoft Corporation, etc.
  • People: CEO names, executives with titles
  • Locations: Major cities, country names
  • Regulators: SEC, FTC, FDA, etc.

Lagos-Inspired Prompts

const LAGOS_PROMPTS = {
  entityRecognition: "Identify key entities...",
  riskAssessment: "Analyze legal and business risk...",
  sentimentAnalysis: "Determine sentiment..."
}

📈 Database Schema

news_articles

id SERIAL PRIMARY KEY
source VARCHAR(255)
title TEXT
content TEXT
url TEXT
published_at TIMESTAMP
author VARCHAR(255)
category VARCHAR(100)
summary TEXT
metadata JSONB
created_at TIMESTAMP
updated_at TIMESTAMP

risk_analyses

id SERIAL PRIMARY KEY
company_name VARCHAR(255)
industry VARCHAR(255)
risk_level VARCHAR(20)
risk_score INTEGER
categories JSONB
factors JSONB
recommendations JSONB
data_points JSONB
concerns JSONB
created_at TIMESTAMP

entity_mentions

id SERIAL PRIMARY KEY
article_id INTEGER REFERENCES news_articles(id)
entity_name VARCHAR(255)
entity_type VARCHAR(50)
mention_context TEXT
sentiment VARCHAR(20)
created_at TIMESTAMP

🧪 Testing

Run test script:

node test-postgres-apis.js

This will show:

  1. Test commands for all APIs
  2. Expected responses
  3. Database setup instructions
  4. Verification steps

📝 Key Files Modified/Created

  1. src/lib/db/postgres.ts - Database connection
  2. src/lib/db/postgres-schema.ts - Table schemas
  3. src/app/api/news/batch/route.ts - News API with PostgreSQL
  4. src/app/api/legal-risk/analyze/route.ts - Risk API with entities
  5. test-postgres-apis.js - Test script
  6. .env.example - Environment variables template

Performance Optimizations

  • Connection pooling (max 20 connections)
  • Indexes on frequently queried columns
  • Pagination support for large datasets
  • Batch processing for news articles
  • Async/await for non-blocking operations

🚀 Next Steps

  1. Add more sophisticated entity recognition
  2. Implement real sentiment analysis
  3. Add data visualization endpoints
  4. Create admin dashboard for monitoring
  5. Add data export functionality

📊 Data Persistence Confirmed

All data now stored in PostgreSQL Survives server restarts Supports concurrent access Ready for production use


Delivered before 19:00 deadline