At InRange, we use machine learning to speed the processes behind renewable energy generation. This is the first in a series which dives into some of the ways we apply machine learning.
Aligning a polygon from OpenStreetMap to a Google Maps image with edge detection
OpenStreetMap’s building service provides us with Polygons of most buildings. These are great for quick analysis of building surface area. We can get satellite imagery from many sources (we’re using Google Maps for this post) which we use at InRange to do much deeper analysis. But first, we need to isolate the image of the building from its surroundings.
The polygon outline of the rooftop obtained from OpenStreetMap and the image of the rooftop from Google Maps do not align. Hence, we cannot accurately cut out just the roof using the polygon.
We aim to solve this by moving around the polygon until we find that it outlines the building. Of course, doing this by hand would be quite time-consuming for many buildings. Instead, we came up with an algorithm that does it for us. By the end of this post, we will have an image of only the rooftop without the surroundings.
Data loading
The bitmap image from Google Maps is loaded as a NumPy array, but the polygon from OpenStreetMap is a Polygon object from the Shapely library. The first task is to turn the Polygon into a Numpy array as well.
We do this by obtaining a NumPy mask from the Polygon which is a 0–1 matrix where an entry has value 1 if it is inside the Polygon and 0 if it is outside. Next, we obtain just the sides of this mask and fill the inside with zeros. (The code for this is omitted as that is not the focus of the blog.)
Going forward, let’s name the NumPy array of the image from Google Maps `image`, the obtained NumPy mask by `polygon_mask` and the edges of that by `polygon_edges`. (The edges are displayed thicker than reality for better visibility.)
Edge detection
The key to aligning the polygon with the rooftop is detecting the edges of the rooftop and aligning the polygon to those edges. Now that we have all the data, we can start the edge detection using Canny edge detection. This method detects edges both related to the roof or to the background.
To make sure we do not detect too many of the unimportant edges present in the photo, the image is first blurred. This way only the more prominent edges end up being detected.
We use Gaussian blurring to do this, which results in a natural blurring with some edges remaining. To perform these computer vision tasks, we rely on the OpenCV package.
import cv2
import numpy as np
# Use blurring to reduce noise in Google Maps image
blurred_image = GaussianBlur(src = image, ksize = (11,11), sigmaX = 0)
# Find optimal lower and upper threshold for canny edge detection
median = np.median(image)constant = 0.33 # Could choose another value
lower = int(max(0, (1.0 - constant) * median))
upper = int(min(255, (1.0 + constant) * median))
# Use Canny edge detection to detect edges
image_edges = Canny( image = blurred_image,
threshold1 = lower,
threshold2 = upper,
L2gradient = True)
In Canny edge detection, thresholds are used to determine which small edge pieces are part of larger important edges and which are most likely just edges due to some noise.
The image in the OpenCV documentation explains this well.
In our example image below, the detected image edges are shown in red. (The detected edges are one pixel wide, but here I plotted thicker lines for better visibility.)
There are of course edges appearing that are not roof edges, but edges of objects on the roof or of the surroundings. Also, the edges do not always provide a fully closed area, clearly, we can’t directly use these edges to cut out the rooftop. However, these edges are still useful as they give an indication of where to align the polygon.
Loss function
Next, we define a loss function to tell us how well-aligned a polygon is with the detected edges. Minimizing this function with respect to the location of the polygon will help us find well-aligned polygons. At this point, both the detected image edges and the polygon are stored as a 0–1 matrix. The polygon fits the image edges well when the 1s in `image_edges` are at the same index as the 1s in `polygon_edges`. So we can define the loss function as the number of entries that are not identical across the two arrays.
loss = np.absolute(image_edges - polygon_edges).sum()
As mentioned above the polygon is moved around until a good fit with the roof is found. This is done by shifting the polygon to minimize this loss function. The shift of the polygon is parameterised by the shift in x and y coordinates.
One way to minimize the loss function is through grid search. That would mean exploring all the possible positions of the polygon, but that is computationally expensive. Instead, we opt for a gradient descent method, which is quicker even if it does not find the absolute minimum.
The search space of gradient descent is the possible shifts of the polygon indexed by (x,y) where x is the number of pixels shifted along the x-axis and the same for y. The value at each coordinate is determined by the loss function.
#Evaluating the loss function at (x,y)
shifted_polygon_edges = shift(polygon_edges, steps = x, axis = 0)
shifted_polygon_edges = shift(shifted_polygon_edges, steps = y, axis = 1)
loss = np.absolute(image_edges - shifted_polygon_edges).sum()
The output of gradient descent is the coordinate (x_shift, y_shift) which is the optimal shift size found by gradient descent.
Let’s examine the loss landscape used here. We can see that it is non-convex, so gradient descent might get stuck in a local minimum. Moreover, the jumps in the function are large making it likely that gradient descent gets stuck.
Better loss function
Gradient descent works better on a smooth loss landscape, but this loss function does not result in a smooth loss function.
Just imagine that a detected rooftop edge aligns perfectly with a well-placed polygon side. Because they are aligned the loss is smaller. But if the polygon is shifted even just by one coordinate, suddenly the loss jumps up.
To smooth the loss, we blur both the polygon and the edges. That way the pixels around the edges will be between 1 and 0, so the loss does not jump as much because of only a one-pixel difference. You can see in these images how the loss fades away from the edges from 1 to 0.
blurred_image_edges = GaussianBlur(src = image_edges, ksize = (0,0), sigmaX = 3)
blurred_polygon_edges = GaussianBlur(src = polygon_edges, ksize = (0,0), sigmaX = 27)
loss_with_blur = np.absolute(blurred_image_edges-blurred_polygon_edges).sum()
x_shift = gradient_descent(image_edges, polygon_edges)[0]
y_shift = gradient_descent(image_edges, polygon_edges [1]
shifted_polygon_edges = shift(polygon_edges, steps = x_shift, axis = 0)
shifted_polygon_edges = shift(shifted_polygon_edges, steps = y_shift, axis = 1)
Cutting out the roof
The last step is cutting out the area enclosed by the polygon from the image. For this, we use the filled mask that we obtained at the beginning. This has to be shifted the same way as the polygon edges were shifted, to cut out the correct roof area.
shifted_polygon_mask = shift(polygon_mask, steps = x_shift, axis = 0)
shifted_polygon_mask = shift(shifted_polygon_mask, steps = y_shift, axis = 1)
With the correctly shifted mask, we can now cut the relevant part of the image in the following way:
roof_image = image* np.repeat(
np.expand_dims(
np.flip(shifted_polygon_mask, axis = 0),
axis = 1),
3, axis = 2))
Conclusion
In this post, we have worked on the problem of aligning a rooftop image with its outline polygon. We have detected the edges of the rooftop along with other edges. Then we aligned the polygon with the edges, based on a defined loss function. Finally, we cut out the rooftop from the image using the shifted polygon.