Internship Report 2018 (1/3)

Microtec logo

This summer I did an internship at Microtec, one of the leading providers of wood-scanning solutions for sawmills, and an emerging power in the domain of food scanning. In the next few posts I’m going to tell you about some of the projects I did there. I was hired with the goal to apply my experience in machine-learning, and especially deep-learning, in order to solve some of the tasks that the company is currently facing. I developed pipelines and prototypes for a number of tasks, further developing those that looked promising into deployment-ready products. In the following posts i will tell you about a bark segmentation model for logs, a localization model for wooden planks, and a model to reconstruct logs in 3D from stereo images.

Log Segmentation Model

For the first project I was given around 300 line-scans of logs, of which some were completely covered with bark but others had only partly retained it in different patterns depending on their storage conditions and duration and some had even lost almost all of their bark. The problem the company needed to solve, was detecting all major zones covered by bark in order to use this information in the following elaboration process. What was particularly challenging about this task was that this detection needs to happen for different wood species growing in different regions being harvested and stored under different conditions that might have a considerable influence on the appearance of the logs, which made the use of a neural network for segmentation necessary. Like with every data-science task I first had to get familiar with the provided data and to develop a pipeline that allowed me to comfortably work with it. I chose to use GEDI for manual ground-truth segmentation and set up a small script that only required us to segment zones of missing bark and undefined zones (like snow), while the background of the logs was removed automatically. After the decisions about the specifics of the pipeline were taken, an intern did the manual segmentation and I started building the network architecture. Like for most of my current CV-networks I used a model based on the DeepLabv3+ architecture that is among the current standards in semantic segmentation. What differentiates it from previous architectures is mainly Atrous Spatial Pyramid Pooling that allows for a much bigger receptive field and better context comprehension for the semantic segmentation task (atrous convolution).

atrous spatial pyramid pooling

Since the images were of variable size I decided to input them as patches of equal size and stitch them together in post processing during inference. Just like in the paper I tried MobileNetV2 and Xception v2 as model backbones to find a tradeoff between speed and accuracy. In order to prevent overfitting, even with our comparably low number of ground truth images, during training we randomly cropped, flipped and turned the input images and applied random brightness changes and image noise to them. This worked very well and we noticed a relatively small amount of overfitting, thus giving us a small difference in training and validation error. We did not want the white background to contribute to our results since it was supposed to be ignored anyways, so in order to do this and to reserve the otherwise wasted resources of the network we applied output and loss-masking on white pixels. Our test results show that while the network with the MobileNet backbone worked really well in terms of processing speed, it did not produce the required segmentation quality with the provided training images. The Xception v2 backbone produced a much better and more reliable segmentation quality and still met the required time frame to predict a whole image, which was part of the requirements. As you can see below the results obtained from the test-set are of high quality even under very bad conditions and have no noticeable difficulties with large barkless zones. Green and yellow are zones classified as bark and no bark respectively while violet zones are part of the background.

results_1results_2results_3

The company provided me with considerable computational as well as human resources and was happy to offer their expertise as well as accepting requests and suggestions from me. That and the friendly atmosphere within the company turned the internship into an exceptional experience that I would happily have again.