This page contains detailed information regarding the troubleshooting for Distributed Processing. For a quick first time setup please refer to this page: Scaling Production with Distributed Processing.
To increase throughput when processing massive projects, SURE software provides the capability to automatically split projects into subprojects. These subprojects are automatically distributed to worker nodes on the local network and are designed in a way that guarantees consistent results across the various subproject's borders.
Follow these easy steps to utilize distributed processing capabilities for your projects:
Setup worker nodes
The SURE installation package contains an executable "SURE-Node.exe" which is used to start a worker node. It can be used from command line or by simply double clicking on the SURE-Node.exe. Once successfully running, a worker node will be on standby and waiting for a master node to assign a job (subproject) to it.
- If the "–ip_addr" parameter is omitted, SURE will try and detect the correct interface/IP on it's own. If the correct interface cannot be detected, please use this parameter to specify the IP address manually.
- All nodes (including master) must be within the same subnet (IPv4).
When working with the SURE GUI, a default user configuration "sure_ui.init" will be created and stored within the directory ".SURE", located in the current users home folder, e.g. "C:\Users\Dan\.SURE". This configuration file also stores the default workspace where your project directories will be stored in. In case you've used the SURE GUI before on the node, by default the distributed processing tasks will be stored within a directory called "SURE_Node_Workspace" within the default project directory. Otherwise, since the "SURE-Node.exe" doesn't create the "sure_ui.init" file by itself you need to manually specify the workspace using the "–workspace" parameter.
On each worker node a SURE Aerial license needs to be activated.
If a firewall permission request pops up, we recommend to Allow access for both the Master and the Node and then restart the SURE-Node.exe executable to make sure the permissions are set correctly.
If this popup does not show, it is possible that your firewall silently blocks any connection. For example in the Windows Defender Firewall check these settings:
For different firewall software solutions please contact your system administrator.
Nodes do not connect
If you made sure that the SURE Master and Node are not blocked in the firewall, it is still possible that the nodes do not connect to the master because the machines cannot ping each other. This can be tested using the ping command on the command line. Another way is to inspect the "Windows Defender Firewall with Advanced Security" (if this system is used) and look out for the ICMPv4-In and ICMPv4-Out rules in both "Inbound Rules" and "Outbound Rules" (see screenshot below)
Setup the project
For both GUI or for CLI, set up your project just like in a conventional way as if distributed processing would not be used, in addition to which you define the subproject size and Activate Distributed Processing.
Start to process
SURE will now run the Analysis stage on the master node in order to gather the information required to split up the project. Once done and after splitting the project, you'll find all the created subprojects within the project folders directory "SubProjects", each of which contains a complete independent SURE project which then will be distributed to an assigned node for processing. Depending on the Scenario there are a certain number of Tiles per Subproject to be generated.
- After starting the distributed processing project, you can switch to the Project Status tab to monitor the overall progress of your project. By selecting one of the tasks referring to one specific subproject, you can gain more information on this specific task, whether it's pending, running, or already finished.
- On the machine where you started the master process, you can connect to the cluster manager where you can obtain information on the cluster (e.g. nodes, IP's, running tasks). Thereto, open your favorite browser and navigate to "localhost:5006".
- All information regarding cluster management and scheduling will be written to "dp.log" within the project directory of the master project.
The master instance will take care to generate a complete and consistent result by merging the various subprojects into the corresponding output DSM/Cloud/Mesh folders.
Once a Subproject is completed, its results are moved to their corresponding location in the master project's basepath. For example in the DSM folder the results from the finished subproject will be already transferred here. The results of each subproject are self-sufficient and can already be inspected or pre-delivered while other subprojects are still processing. This allows to start with quality checks while other parts of a project are still being generated.
Raw Dense Clouds
Due to the massive storage requirements, the raw point clouds generated during the dense cloud stage of the software will not be copied back to the master node's subproject's folders and they will be deleted at the end of the subprojects processing.
There are intermediate products which are not transferred at the end of Distributed Processing for example the DSM Extended will not be transferred back to the DSM 'master output folder' after the Distributed Processing is done.This output can be found in the corresponding Subproject from the SubProjects folder.
Troubleshooting & known issues
- If a node or job fails (e.g. power down or other interference), the respective job will be redistributed to another SURE-Node.
- In case the master fails, SURE will continue only with the unfinished subprojects, thus the user does not have to worry about the finished subprojects because they will not be reprocessed. Only the Subprojects that were running at the time of the master crash may need to be restarted from scratch. Thus open the SURE project from the Master PC, restart the nodes and press the button, the remaining subprojects for processing will be started.
- In case that tile size auto it is not used in processing, then pay attention that the size of a tile will influence the size of a Subproject. Larger tile sizes will lead to larger subprojects areas.
- If a node doesn't terminate when pressing CTRL-C you may close the terminal, however, please check whether unwanted files remain in the nodes workspace.
- A Subproject behaves as an usual SURE project, please do not delete any files from the subprojects folder until the process is completely done, also do not open it in another software if this specific Subproject is not finished.
- When a project is ran in Distributed Processing mode, there are several logs that contain the processing steps/times/error handling: the log.txt file of the main project, the dp.log and the log files of each subproject.
Area-based splitting into subprojects
Distributed processing workflow