Distributed Processing

This page contains detailed information regarding the troubleshooting for Distributed Processing. For a quick first time setup please refer to this page: Distributed Processing - Step-by-Step

To increase throughput when processing massive projects, SURE software provides the capability to automatically split projects into subprojects. These subprojects are automatically distributed to worker nodes on the local network and are designed in a way that guarantees consistent results across the various subproject's borders.

Follow these easy steps to utilize distributed processing capabilities for your projects:

Setup worker nodes

The SURE installation package contains an executable "SURE-Node.exe" which is used to start a worker node. It can be used from command line or by simply double clicking on the SURE-Node.exe. Once successfully running, a worker node will be on standby and waiting for a master node to assign a job (subproject) to it.

CLI
usage: SURE-Node.exe [-h] [-i IP_ADDR] [--workspace WORKSPACE]
optional arguments:
-h, --help show this help message and exit
-i IP_ADDR, --ip_addr IP_ADDR IP address of network interface to use (may be needed in case of multiple interfaces)
--workspace WORKSPACE workspace for the sure node

Network interfaces

  • If the "–ip_addr" parameter is omitted, SURE will try and detect the correct interface/IP on it's own. If the correct interface cannot be detected, please use this parameter to specify the IP address manually.
  • All nodes (including master) must be within the same subnet (IPv4).

Workspace

When working with the SURE GUI, a default user configuration "sure_ui.init" will be created and stored within the directory ".SURE", located in the current users home folder, e.g. "C:\Users\Dan\.SURE". This configuration file also stores the default workspace where your project directories will be stored in. In case you've used the SURE GUI before on the node, by default the distributed processing tasks will be stored within a directory called "SURE_Node_Workspace" within the default project directory. Otherwise, since the "SURE-Node.exe" doesn't create the "sure_ui.init" file by itself you need to manually specify the workspace using the "–workspace" parameter.

Master node

The master is the machine you're starting/controlling the distributed processing from. On this machine the Analysis step will be performed and if you need this machine to be involved in the processing of Dense Clouds/ DSM/ TrueOrtho/Mesh, then you need to simply also setup a worker node (as explained above) on this machine as well. You cannot actively start a master node using the executable shipped with the software. The software will take care of this for you once you start a distributed project.

License requirements

On each worker node a SURE Aerial license needs to be activated.

Firewall

If a firewall permission request pops up, we recommend to Allow access for both the Master and the Node and then restart the SURE-Node.exe executable to make sure the permissions are set correctly.

If this popup does not show, it is possible that your firewall silently blocks any connection. For example in the Windows Defender Firewall check these settings:

Windows 10

Windows 7

For different firewall software solutions please contact your system administrator.

Nodes do not connect

If you made sure that the SURE Master and Node are not blocked in the firewall, it is still possible that the nodes do not connect to the master because the machines cannot ping each other. This can be tested using the ping command on the command line. Another way is to inspect the "Windows Defender Firewall with Advanced Security" (if this system is used) and look out for the ICMPv4-In and ICMPv4-Out rules in both "Inbound Rules" and "Outbound Rules" (see screenshot below)

Setup the project

For both GUI or for CLI, set up your project just like in a conventional way as if distributed processing would not be used, in addition to which you define the subproject size and Activate Distributed Processing.

CLI
usage: SURE.exe
-subproject_size <tiles per dimension > -> Tiles per subproject.
-distributed -> Use distributed processing
  • use the "-subproject_size" parameter to specify the number of tiles per subproject. By default an Aerial scenario has 16x16 tiles, the Default and the Oblique scenarios are having 8x8 Tiles.
  • use the "-distributed" flag to start distributed processing once the project generation and splitting is done
GUI - Activate Distributed Processing
  • In the lower part of the processing control panel on the right-hand side, enable the "Activate Distributed Processing" checkbox in order to have Access to Distributed Processing. Note how the "Play" button changes its icon.
  •  SURE will automatically split the Project into Subprojects containing these automatic amount of tiles. The project area will be subdivided into a number of subprojects, containing the specified amount of tiles per dimension, which will be distributed to the processing nodes. For example the Aerial Nadir Scenario will be having by default a Subproject size of 16x16 Tiles. During the project definition, among the general project parameters, you may choose the "Subproject size (tiles per dimension) for distributed processing" or leave the default value.

   



Start to process

SURE will now run the analysis stage on the master node in order to gather the information required to split up the project. Once done and after splitting the project, you'll find all the created subprojects within the project folders directory "SubProjects", each of which contains a complete independent SURE project which then will be distributed to an assigned node for processing. Depending on the Scenario there are a certain number of Tiles per Subproject to be generated.

Monitor progress

  • After starting the distributed processing project, you can switch to the Project Status tab to monitor the overall progress of your project. By selecting one of the tasks referring to one specific subproject, you can gain more information on this specific task, whether it's pending, running, or already finished.
  • On the machine where you started the master process, you can connect to the cluster manager where you can obtain information on the cluster (e.g. nodes, IP's,  running tasks). Thereto, open your favorite browser and navigate to "localhost:5006".
  • All information regarding cluster management and scheduling will be written to "dp.log" within the project directory of the master project.

Results

The master instance will take care to generate a complete and consistent result by merging the various subprojects into the corresponding output DSM/Cloud/Mesh folders. 

Once a Subproject is completed, its results are moved to their corresponding location in the master project's basepath. For example in the DSM folder the results from the finished subproject will be already transferred here. The results of each subproject are self-sufficient and can already be inspected or pre-delivered while other subprojects are still processing. This allows to start with quality checks while other parts of a project are still being generated. 

Raw Dense Clouds

Due to the massive storage requirements, the raw point clouds generated during the dense cloud stage of the software will not be copied back to the master node's subproject's folders and they will be deleted at the end of the subprojects processing.

DSM Extended

There are intermediate products which are not transferred at the end of Distributed Processing for example the DSM Extended will not be transferred back to the DSM 'master output folder' after the Distributed Processing is done.This output can be found in the corresponding Subproject from the SubProjects folder.

Troubleshooting & known issues

  • If a node or job fails (e.g. power down or other interference), the respective job will be redistributed to another SURE-Node.
  • In case the master fails, SURE will continue only with the unfinished subprojects, thus the user does not have to worry about the finished subprojects because they will not be reprocessed. Only the Subprojects that were running at the time of the master crash may need to be restarted from scratch. Thus open the SURE project from the Master PC, restart the nodes and press the  button, the remaining subprojects for processing will be started.
  • In case that tile size auto it is not used in processing, then pay attention that the size of a tile (the size of a DSM/TrueOrtho Tile) will influence the size of a Subproject. Larger tile sizes will lead to larger subprojects areas.
  • If a node doesn't terminate when pressing CTRL-C you may close the terminal, however, please check whether unwanted files remain in the nodes workspace.
  • A Subproject behaves as an usual SURE project, please do not delete any files from the subprojects folder until the process is completely done, also do not open it in another software if this specific Subproject is not finished. 
  • When a project is ran in Distributed Processing mode, there are several logs that contain the processing steps/times/error handling: the log.txt file of the main project, the dp.log and the log files of each subproject.



Area-based splitting into subprojects



Subproject tiling




Distributed processing workflow