Distributed computer vision for activity recognition

Writer: Raúl Santos de la Cámara

In Next Perception we not only research on algorithm and methods for the sensing and perception part of our applications, but have applied a strong emphasis on doing so in manners that are not centralized but distributed across the network. As an example, we present here a small example of distributed sensing that is used to detect actions by patients with the processing located quite far away from the action itself.

The piloting for Next Perception UC1-P2 was done at Lleida, in Catalonia, northeast of Spain in facilities provided by partner IRBLL. There, the setup was built on a one-bedroom model flat that resembled some used in senior homes. For the sake of simplicity only the living room was used for patient monitoring. There, two cameras were installed on the opposite sides:

Distributed computer vision

We can see how the cameras covered completely the living room. In order to do the action recognition, an AI system was used that detected activity in four major categories: Cleaning, Eating, Exercising, and Walking.

The AI system required some powerful hardware to execute, so we decided to divide the work and offload the execution of the heavier AI side to our servers at HI Iberia in Madrid, Spain by means of an Eclipse Zenoh deployment:

We can see how on the side at Lleida (right) both cameras 1 and 2 encode the video over RTSP (solid lines) and also generate locally a coarse motion detection signal (dotted lines) to signify that relevant actions might be happening at each one of the cameras. This is encoded in a Zenoh subscription channel and then sent over to the regular internet. At the Madrid side (left) this is received by the router and then a camera multiplexer that decides which side to process the video data based on the motion detection metadata. This is then connected to the relevant Zenoh worker that in turn extracts the RTSP stream into video frames that are then used to feed the action recognition engine.

By using this approach, we can save resources as (1) high powered AI hardware need not be installed at the deployment side and (2) only relevant data which has been tagged as proper motion data is processed at the AI engine. During the first pilot of Next Perception in late summer 2021 this was done with minimal latency that did not impact the performance for its usage with real monitored patients.

Over the rest of the project we will improve on this approach, adding more powerful edge sensors (such as Nvidia Jetson Nano AI boards) at the Lleida side and orchestrate all of this using a more advanced Zenoh approach.

Interested readers can get more detail about this in our public deliverable D4.5 First Pilot Evaluation Reports as well as see it in action in this video.