About NewsMonitor

A Semantic Feed Aggregator

Source code: github.com/danja/transmissions

Overview

NewsMonitor is a feed aggregator application built on the Transmissions message processing framework. It subscribes to RSS, Atom, and RDF feeds, stores their content in a SPARQL triple store, and provides a web interface for browsing aggregated posts.

Core Functionality

The application performs three primary functions:

Feed Subscription: Accepts feed URLs and stores feed metadata in RDF format
Content Retrieval: Fetches feed entries on a scheduled basis and stores them as RDF triples
Content Presentation: Provides a web interface for browsing and searching aggregated content

Architecture

NewsMonitor consists of several components:

Backend: Node.js application using the Transmissions framework for message processing
Storage: Apache Jena Fuseki SPARQL server for RDF data persistence
Frontend: Static HTML/CSS/JavaScript interface served via HTTP
Scheduler: Automated feed update process running at configurable intervals

Data is stored in two named graphs:

http://hyperdata.it/feeds - Feed metadata (titles, URLs, format information)
http://hyperdata.it/content - Individual post entries with titles, links, dates, and content summaries

Feed Processing Pipeline

When subscribing to a feed, NewsMonitor executes a pipeline of processors:

HTTP client fetches the feed XML
Feed parser extracts individual entries
Deduplicator checks for existing entries using GUIDs and content hashes
RDF builder converts entries to RDF triples using Nunjucks templates
SPARQL updater inserts new entries into the triple store

Updates run automatically every hour by default, with the interval configurable via environment variables.

Data Model

NewsMonitor uses the SIOC (Semantically-Interlinked Online Communities) vocabulary for representing feeds and posts:

Feeds are typed as sioc:Forum
Posts are typed as sioc:Post
Dublin Core terms (dc:title, dc:date, dc:creator) provide metadata
Posts link to their source feeds via sioc:has_container

This RDF-based storage enables SPARQL queries for flexible content retrieval and integration with other semantic web applications.

Features

Mobile-responsive interface
Search and filtering across all posts
Pagination for browsing large result sets
Admin interface for feed management
Automatic feed updates on configurable schedules
Manual feed update triggers
SPARQL-based storage for semantic queries
Support for RSS, Atom, and RDF feed formats

Technical Stack

Built using ES modules and modern JavaScript
Apache Jena Fuseki for RDF triple storage
Nunjucks templates for RDF generation
Docker containerization
RESTful API endpoints

Part of the Transmissions Framework

Learn more at github.com/danja/transmissions