← Back to NewsMonitor
Overview
NewsMonitor is a feed aggregator application built on the Transmissions message processing framework. It subscribes to RSS, Atom, and RDF feeds, stores their content in a SPARQL triple store, and provides a web interface for browsing aggregated posts.
Core Functionality
The application performs three primary functions:
- Feed Subscription: Accepts feed URLs and stores feed metadata in RDF format
- Content Retrieval: Fetches feed entries on a scheduled basis and stores them as RDF triples
- Content Presentation: Provides a web interface for browsing and searching aggregated content
Architecture
NewsMonitor consists of several components:
- Backend: Node.js application using the Transmissions framework for message processing
- Storage: Apache Jena Fuseki SPARQL server for RDF data persistence
- Frontend: Static HTML/CSS/JavaScript interface served via HTTP
- Scheduler: Automated feed update process running at configurable intervals
Data is stored in two named graphs:
http://hyperdata.it/feeds - Feed metadata (titles, URLs, format information)
http://hyperdata.it/content - Individual post entries with titles, links, dates, and content summaries
Feed Processing Pipeline
When subscribing to a feed, NewsMonitor executes a pipeline of processors:
- HTTP client fetches the feed XML
- Feed parser extracts individual entries
- Deduplicator checks for existing entries using GUIDs and content hashes
- RDF builder converts entries to RDF triples using Nunjucks templates
- SPARQL updater inserts new entries into the triple store
Updates run automatically every hour by default, with the interval configurable via environment variables.
Data Model
NewsMonitor uses the SIOC (Semantically-Interlinked Online Communities) vocabulary for representing feeds and posts:
- Feeds are typed as
sioc:Forum
- Posts are typed as
sioc:Post
- Dublin Core terms (
dc:title, dc:date, dc:creator) provide metadata
- Posts link to their source feeds via
sioc:has_container
This RDF-based storage enables SPARQL queries for flexible content retrieval and integration with other semantic web applications.
Features
- Mobile-responsive interface
- Search and filtering across all posts
- Pagination for browsing large result sets
- Admin interface for feed management
- Automatic feed updates on configurable schedules
- Manual feed update triggers
- SPARQL-based storage for semantic queries
- Support for RSS, Atom, and RDF feed formats
Technical Stack
- Built using ES modules and modern JavaScript
- Apache Jena Fuseki for RDF triple storage
- Nunjucks templates for RDF generation
- Docker containerization
- RESTful API endpoints