Digital Frame

Introduction

Digital Frame Product Vision

The Digital Frame is an innovative IoT platform designed to create immersive and human-centered interactions through a Connected Digital Painting.

Main Goals

Human-Centric Interaction

Description of natural engagement via gaze, voice, and movement, minimizing the need for touch interfaces..

Seamless Integration

Description of designed with premium materials (wooden frame, matte display) for aesthetic cohesion in retail, reception areas, and cultural venues..

Dynamic Content

Description of real-time customization of displayed content based on observed user characteristics and behaviors..

Scalable Architecture

Description of easily adaptable hardware and software for different scenarios (edge-only, hybrid, or cloud-centric)..

Project Description

The Digital Frame Platform (DFP) combines advanced hardware sensors with sophisticated AI processing to create an interactive and dynamic user experience through a Connected Digital Painting.

Core components

Internal Sensors: The system utilizes a camera, proximity sensor, and microphone to observe and respond to the user’s presence, emotions, gestures, and speech.
Main Elaboration Unit: The central unit coordinates the data processing, ensuring smooth interaction between all components.
AI Computing Engine: It analyzes the raw data from the sensors, extracting meaningful insights which help the system tailor content dynamically.
Communication Interface: The system interacts with external systems, particularly it can be integrated with different Content Management Systems (CMS), for fetching and playing appropriate content based on the insights derived from the sensors.

Content Management System (CMS)

Correlation Engine (CE): It refines and correlates the data collected by the Digital Frame.
Rule Engine (RE): It determines the system’s response based on the refined data, such as what content to display or actions to take.
Content Dispatcher (CD): It manages the distribution and playback of multimedia content based on the decisions made by the Rule Engine.

Deployment Scenarios

The platform supports multiple deployment scenarios to offer flexibility based on the environment and requirements.

Offline - All on DFP: In this offline mode, all components (AI Computing Engine and CMS) are run locally on the Digital Frame. Media content is preloaded on the device and played according to the local business logic defined in the Rule Engine.
External CMS: This mode relies on cloud-based processing where all CMS components are hosted externally. The DFP acts primarily as a sensor and display unit, sending contextual data to the CMS and executing content playback based on its responses.
Hybrid Mode: A combination of local processing for basic interactions and external CMS processing for more complex tasks, providing a balanced approach to content delivery and interaction. For example, the DFP can locally emulate some logic to reduce latency or maintain basic interactivity.
Raw Data Bypass: In this setup, the Digital Frame only sends raw sensor data to the external CMS, which handles the entire feature extraction and decision-making process using its own AI and content pipeline.

Use Cases

Below are detailed example scenarios illustrating how the Digital Frame Platform can be leveraged in real-world applications:

Interactive Art Gallery

Digital frames replace traditional static paintings. Using face detection and emotion recognition, the display adapts artwork based on the viewer's emotional reaction, creating an immersive, personalized art experience. The microphone array can capture feedback and trigger audio narrations that provide context. In advanced setups, gesture recognition could allow viewers to switch artworks with a simple hand wave.

Retail Personalization

A store can integrate the DFP with its CMS to display targeted ads or promotions based on the shopper's demographics and emotional state. When the proximity sensor detects someone near, the frame greets them and suggests relevant products, increasing custome rengagement with real-time analytics. By recognizing repeat customers, the system can offer loyalty rewards or personalized greetings.

Elderly Care and Hospital Monitoring

In elderly care facilities or hospitals, motion detection and emotion recognition can identify signs of distress in residents or patients, automatically alerting caregivers if someone appears anxious, disoriented, or has fallen. Over time, data analysis may provide insights to improve patient well-being and reduce emergency incidents.

Wayfinding in Large Buildings

The Digital Frame can assist visitors in complex environments like hospitals or universities by using the proximity sensor and voice recognition to provide personalized directions, accessibility tips, and on-screen guidance based on the user’s location, mobility status, and spoken queries.

Corporate Reception / Welcome Board

Companies can greet employees or visitors with personalized messages when the proximity sensor and face recognition detect them. Recognized staff might see internal updates, while visitors see relevant onboarding info or directions. Additionally, staff check-in could be automated, streamlining front-desk operations.

Smart Home Control Center

Serving as a central dashboard for a smart home, the DFP can display customized messages based on occupant presence, control IoT devices via voice commands, and share environmental data like temperature or air quality, enhancing convenience and energy efficiency. This setup could extend to security alerts or remote monitoring for peace of mind.

Project Features

Prototype Hardware Components

The Digital Frame Platform integrates several advanced hardware components to enable its interactive capabilities. In its final version, it will support different variations of the following components, each meeting or exceeding the performance outlined below.

RGB Camera: Arducam, 5 Megapixel con Autofocus
Microphone Array: XK-VOICE-SQ66 4-Mic Linear Array Kit
Audio Output: Stereo system with multiple speakers
Proximity Sensor: Laser-ranging miniature sensor
NFC Sensor: µFR Zero HS OEM RF 13.56 MHz
CPU/GPU: ARMx64 Rockchip3588
Display: LCD, Full HD (43"), matte anti-reflection finish
Connectivity: Wi-Fi 6, Bluetooth 5.0, Gigabit Ethernet

Software & AI

The Digital Frame Platform integrates AI pipelines and sensor fusion logic to detect faces, gestures, and more. It provides local inference or cloud-based processing for tasks like object detection, audio analysis, and personalization.

Main Sensor Capabilities

Camera: Person tracking, age estimation, emotion recognition, gesture recognition, and object detection.
Microphone: Speech recognition, voice commands.
Proximity Sensor: Movements detection, within configurable radius for interactive responses.

API & SDK

The DFP is designed with extensibility in mind, offering a set of APIs and a developer-friendly SDK to integrate seamlessly with external systems. These tools enable third-party developers or CMS vendors to customize interactions, automate content delivery, and leverage sensor data to create responsive and context-aware experiences.

API Capabilities

The API exposes essential functionalities of the platform, including access to sensor-generated insights (e.g., detected persons, actions, emotions), triggering of content playback, and device configuration. It supports real-time bidirectional communication between the Digital Frame and external CMS components, allowing seamless coordination between local observations and remote decision-making engines.

SDK Features

The SDK includes a set of developer tools, libraries, and documentation that facilitate integration and testing. The SDK helps developers build tailored logic, test responses, and validate CMS compatibility across different deployment modes (offline, hybrid, or online).

Access to structured sensor data and insights
Subscription to real-time event streams and state changes
Programmatic control of content playback and scheduling
Support for custom CMS integrations and rule logic testing

Tutorial

A quick step-by-step guide to create a basic interactive application:

Hardware Setup: Mount the DFP, connect power and network.
API Registration: Obtain credentials for the DFP's REST endpoints.
SDK Installation: Install packages for DFP integration.
First Request: Call /api/v1/face to detect faces and retrieve data.
Event Handling: Listen for motion or audio triggers to adapt your application's behavior.

Implementation

Platform Configurability & Feature Extraction

The Digital Frame Platform is designed to be highly configurable, allowing customers or third-party integrators to adapt its behavior and content strategies to a wide range of business scenarios.

Configuration acts as a formal agreement between the DFP and the connected CMS. Based on the selected features and only in the External CMS and Hybrid Mode scenarios, the DFP will generate and transmit structured output files at regular intervals, containing the sensor-based insights that were enabled during setup.

The available capabilities depend on the specific hardware and software configuration—particularly the performance of the AI Computing Engine.

The table below summarize the current supported sensor features.

Platform Capabilities

Supported Applications	Brief Description	Capabilities Description	Expected Output
Person Counting	This feature allows counting the number of people present in the view of the camera. If raw data forwarding is enabled, the person counting can be performed by the CMS.	Data provided: Number of people in the screen Frequency: From 500ms to 5s, configurable Max distance guaranteed: 4m from the frame Max cone of view: 120° The quality of the detection depends on light conditions, distance, and angle. Note: For cloud-based classification, the person counting is performed by the CMS. The data provided will be the Raw Video if a large bandwidth is available. See Raw Data forwarding	`numpeople: 1`
Person Tracking	This feature allows tracking people in the view of the camera. At each person found, an ID is provided. If raw data forwarding is enabled, the person tracking can be performed by the CMS.	Data provided: Person ID (not persistent), person coordinates in the screen Frequency: From 500ms to 5s, configurable Max distance guaranteed: 4m from the frame Max cone of view: 120° The quality of the detection depends on light conditions, distance, and angle.	`personID: 1 boundingBox: x1: 100 y1: 150 x2: 200 y2: 300`
Age Estimation	This feature allows estimating the age of a person seen by the camera. If raw data forwarding is enabled, the age estimation can be performed by the CMS.	Data provided: A number (int) with a confidence score (float) from 0 to 1 Frequency: From 500ms to 5s, configurable Max distance guaranteed: 3m from the frame Max cone of view: 120° This field is inside person tracking. The quality of the detection depends on light conditions, distance, and angle.	`age: 25`
Emotion Estimation	This feature allows estimating the emotion of a person seen by the camera. If raw data forwarding is enabled, the emotion estimation can be performed by the CMS.	Data provided: The emotion (String) with a confidence score (float) from 0 to 1 Frequency: From 500ms to 5s, configurable Max cone of view: 0° The person should have the camera directly facing them. This field is inside person tracking. The quality of the detection depends on light conditions, distance, and angle.	`name: happy confidence: 0.75`
Gesture Recognition	This feature allows recognizing gestures made by a person seen by the camera. If raw data forwarding is enabled, the gesture recognition can be performed by the CMS.	Data provided: The gesture (String) with a confidence score (float) from 0 to 1 Frequency: From 500ms to 5s, configurable Max cone of view: 120° This field is inside person tracking. The quality of the detection depends on light conditions, distance, and angle.	`value: peace confidence: 0.9`
Object Detection	This feature allows recognizing objects within the camera's view. If raw data forwarding is enabled, the object detection can be performed by the CMS.	Data provided: The object (String) with a confidence score (float) from 0 to 1 Frequency: From 500ms to 5s, configurable Max cone of view: 120° The quality of the detection depends on light conditions, distance, and angle.	`value: Bottle confidence: 0.85`
OpenAi Integration	This feature allows sending image to open Ai for letting ChatGPT inference on them	Data provided: The frame acquired by the camera Frequency: At least 5 seconds Max cone of view: 120° The quality of the detection depends on light conditions, distance, and angle.	`response: <Depend by the promt sent by the user>`

The configuration interface allows fine-tuning of threshold values, processing priorities, and detection parameters. These settings can be managed through the web interface or programmatically via the API.

Example Data Output

The DFP collects and formats sensor data into a unified JSON structure for easy integration. Here is an example of the output data format.

{
  "deviceId": "ABC12345",
  "timestamp": "1746512360",
  "numpeople": 1,
  "people": [
    {
      "personID": "1",
      "boundingBox": {
        "x1": 100,
        "y1": 150,
        "x2": 200,
        "y2": 300
      },
      "age": 25,
      "emotion": {
        "name": "happy",
        "confidence": 0.75
      },
      "gesture": [
        {
          "value": "peace",
          "confidence": 0.6
        }
      ],
      "objectsDetected": [
        {
          "value": "bottle",
          "confidence": 0.65
        }
      ]
    }
  ],
  "OpenAI_Response": [
    {
      "person_detected": "yes",
      "has_glasses": "yes",
      "has_hat": "no"
    }
  ],
  "video": {
    "lastPlayed": {
      "timestamp": "1746512300",
      "path": "../../content/base_videos/intro.mp4",
      "downloaded": false
    },
    "errors": [
      {
        "timestamp": "1746512310",
        "message": "Failed to remove downloaded video tmp123.mp4: Permission denied"
      }
    ]
  },
  "microphone": {
    "state": "recorded",
    "text": "Hello world",
    "time_sec": "3.21",
    "errors": [
      {
        "timestamp": "1746512325",
        "message": "Audio device unavailable"
      }
    ]
  },
  "proximity": {
    "outcome": "near",
    "distance": 500,
    "video": {
      "lastPlayed": {
        "timestamp": "1746512320",
        "path": "../../content/extra_videos/alert.mp4",
        "downloaded": true
      },
      "errors": [
        {
          "timestamp": "1746512330",
          "message": "Polling thread timeout"
        }
      ]
    }
  },
  "remote_sync": {
    "events": [
      {
        "timestamp": "1746512340",
        "role": "client",
        "direction": "outgoing",
        "peer": "FRAME_2",
        "event": "DETECTED_PEOPLE_CAMERA",
        "params": "1",
        "raw": "FRAME_1|FRAME_2|DETECTED_PEOPLE_CAMERA|1"
      },
      {
        "timestamp": "1746512345",
        "role": "server",
        "direction": "incoming",
        "peer": "FRAME_2",
        "event": "ACK",
        "params": "OK",
        "raw": "FRAME_2|FRAME_1|ACK|OK"
      }
    ],
    "errors": [
      {
        "timestamp": "1746512350",
        "role": "client",
        "message": "Error sending to FRAME_2: Connection reset"
      }
    ]
  },
  "fsm": {
    "current": "idle",
    "previous": "alert",
    "history": [
      {
        "timestamp": "1746512290",
        "state": "idle"
      },
      {
        "timestamp": "1746512310",
        "state": "active"
      },
      {
        "timestamp": "1746512330",
        "state": "alert"
      },
      {
        "timestamp": "1746512360",
        "state": "idle"
      }
    ]
  }
}

This JSON file represents structured data collected by a Digital Frame system equipped with a camera, microphone, proximity sensor, and AI capabilities. The device monitors its environment, detects people, plays content, reacts to proximity, and syncs with other devices. Here's a breakdown of the data:

General Information

deviceId: Unique ID of the frame ("ABC12345").
timestamp: Unix Epoch time ("1746512360") representing when this data snapshot was taken.

Camera + AI (`people`)

personID: Unique ID for this person in the session.
boundingBox: Location of the person in the camera frame (x/y coordinates).
ageEstimated: AI-estimated age (25) with confidence 85%.
genderClassified: Detected gender ("male") with 90% confidence.
emotionRecognized: Detected emotion ("happy"), confidence 75%.
gestureRecognized: Detected gesture ("wave"), confidence 60%.
actionRecognized: Detected activity ("standing"), confidence 80%.
objectsDetected: Objects seen near the person: "cell phone" (88% confidence), and "bottle" (65% confidence)

Video Playback Info (`video`)

lastPlayed: A video named intro.mp4 was played shortly before this snapshot.
errors: An error occurred trying to remove a temporary video file (permission denied).

Microphone Input (`microphone`)

state: Indicates the mic recorded audio.
text: Transcribed speech: "Hello world".
time-sec: Duration of the recording (3.21 seconds).
errors: Microphone encountered an issue ("Audio device unavailable").

Proximity Sensor Interaction (`proximity`)

outcome: A person was detected near the frame.
video.lastPlayed: A proximity-triggered video (alert.mp4) was played.
errors: An issue with the polling thread caused a timeout.

Remote Synchronization (`remote_sync`)

Logs communication between this device and another (FRAME_2): an Outgoing event reported a detected person and an Incoming acknowledgment (ACK).
errors: An error occurred during sync: connection reset.

Finite State Machine (`fsm`)

current: "idle" — system is waiting.
previous: "alert" — system was recently active.
history: Log of recent states and timestamps: Idle → Active → Alert → Idle

Support

For technical assistance, bug reports, and community discussions:

Official Support: digital.frame@reply.it
Contacts: Daniele Vitali and Roberto Pellegrini
Developers: Domenico Virgilio, Walter Re and Francesco Giacometti

Our team of experienced developers and technical support specialists is available to assist with implementations, configurations, and troubleshooting.

License TBD

This project is licensed under the MIT License. You may use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the conditions below:

MIT License Copyright (c) 2025 Digital Frame Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.