SurferDTM's documentation
Scientific basis (sort of...)
This little project ideally prosecute my Master Degree's work thesis. To bring this idea to the next level (and to try first-hand some cool technologies) I created this fullstack project. Basically I start extracting a geographic area polygon (right now I simply start from volcanoes of the world's list of coordinates). I use these coordinates to prepare a DTM raster, then elaborated in an RGB (slope - elevation - curvature) image that I submit to a machine learning instance detection. The machine learning model response it's one (or more) array(s) of coordinates points that I can georeference and show on my frontend map.
Workflow:
- Frontend
- choose some map coordinates (right now I start from volcanoes of the world's list of coordinates)
- for every volcano coordinates I extract the coordinates for two angle points
- using these coordinates I send a request to a backend to download and merge terrain tiles from nextzen european S3 endpoint.
- Backend (lambda AWS)
- after that I clip the downloaded raster using the initial coordinates
- then I build an RGB image with slope, raw raster and curvature from the clipped raster
- finally I send a request to ML api to get a prediction
- the ML prediction response it's a json containing one (or more) array(s) of coordinates points that once georeferenced
- Frontend
- these coordinates points became one (or more) polygon(s) that I can add to my frontend map
surferdtm-prediction-api serverless lambda
This project exposes a serverless backend API which takes a POST request containing some geographic data
{
"bounding_box": [[38.03932961278458, 15.36808069832851], [37.455509218936974, 14.632807441554068]],
"zoom": 9,
"slope_cellsize": 61,
"model_project_name": "surferdtm",
"model_version": 4,
"debug": "false",
"skip_conditions_list": [
{
"skip_key": "confidence",
"skip_value": 0.3,
"skip_condition": "major"
}
]
}
{
"bounding_box": [[38.03932961278458, 15.36808069832851], [37.455509218936974, 14.632807441554068]],
"zoom": 9,
"slope_cellsize": 61,
"model_project_name": "surferdtm",
"model_version": 4,
"debug": "false",
"skip_conditions_list": [
{
"skip_key": "confidence",
"skip_value": 0.3,
"skip_condition": "major"
}
]
}
to prepare a dem raster image, transform in an RGB image and predict if contains a volcano.
Initially the chosen machine learning model was detectron2 because it's easy to perform instance segmentation. Finally, I created and deployed a custom model with Roboflow because there is a small amount of inference requests free of charge from the Roboflow hosted service. Right now the machine learning model only recognize stratovolcanoes. The body response in a successful case ("statusCode": 200) contains a MultiPolygon geojson:
{
"message": "merged_raster",
"duration_run": 9.513317285978701,
"request_id": "58ad63aa-88da-4fce-80e5-71d865feb6a5",
"build_time": "20230114_235155",
"output": {
"uploaded_file_name": "real_rgb_14.632807441554068,37.455509218936974,15.36808069832851,38.03932961278458_9.png",
"bucket_name": "suferdtm-prediction-api",
"prediction_georef": {
"features": [{
"type": "Feature",
"properties": {
"id": 0,
"confidence": 0.535,
"geomorphic_class": "mountain"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[1630661.6992189789, 4502905.336279734],
[1630707.56143595, 4579541.100838453],
[1708214.7081171188, 4579648.112678051],
[1708367.5821736893, 4579189.49050834],
[1708077.1214662055, 4539289.361743478],
[1708214.7081171188, 4527212.3112744205],
[1707908.960003978, 4509326.046655688],
[1707756.0859474079, 4510090.4169385405],
[1707908.960003978, 4512842.149956807],
[1707756.0859474079, 4513453.646183088],
[1706991.715664556, 4514218.01646594],
[1706533.0934948449, 4513759.394296229],
[1706395.5068439317, 4513147.898069948],
[1706227.3453817042, 4510090.4169385405],
[1705615.8491554228, 4509173.172599118],
[1705157.2269857118, 4507797.306089985],
[1705157.2269857118, 4504434.076845437],
[1706380.2194382746, 4503669.706562585],
[1707297.4637776967, 4503669.706562585],
[1707908.960003978, 4503975.454675727],
[1708061.8340605486, 4503669.706562585],
[1707908.960003978, 4503501.545100358],
[1706685.9675514153, 4503669.706562585],
[1706227.3453817042, 4503363.958449445],
[1706074.471325134, 4502905.336279734],
[1630661.6992189789, 4502905.336279734]
]
]
]
}
}],
"type": "FeatureCollection",
"name": "geojson_name",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::3857"
}
}
},
"n_total_obj_prediction": 49
}
}
{
"message": "merged_raster",
"duration_run": 9.513317285978701,
"request_id": "58ad63aa-88da-4fce-80e5-71d865feb6a5",
"build_time": "20230114_235155",
"output": {
"uploaded_file_name": "real_rgb_14.632807441554068,37.455509218936974,15.36808069832851,38.03932961278458_9.png",
"bucket_name": "suferdtm-prediction-api",
"prediction_georef": {
"features": [{
"type": "Feature",
"properties": {
"id": 0,
"confidence": 0.535,
"geomorphic_class": "mountain"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[1630661.6992189789, 4502905.336279734],
[1630707.56143595, 4579541.100838453],
[1708214.7081171188, 4579648.112678051],
[1708367.5821736893, 4579189.49050834],
[1708077.1214662055, 4539289.361743478],
[1708214.7081171188, 4527212.3112744205],
[1707908.960003978, 4509326.046655688],
[1707756.0859474079, 4510090.4169385405],
[1707908.960003978, 4512842.149956807],
[1707756.0859474079, 4513453.646183088],
[1706991.715664556, 4514218.01646594],
[1706533.0934948449, 4513759.394296229],
[1706395.5068439317, 4513147.898069948],
[1706227.3453817042, 4510090.4169385405],
[1705615.8491554228, 4509173.172599118],
[1705157.2269857118, 4507797.306089985],
[1705157.2269857118, 4504434.076845437],
[1706380.2194382746, 4503669.706562585],
[1707297.4637776967, 4503669.706562585],
[1707908.960003978, 4503975.454675727],
[1708061.8340605486, 4503669.706562585],
[1707908.960003978, 4503501.545100358],
[1706685.9675514153, 4503669.706562585],
[1706227.3453817042, 4503363.958449445],
[1706074.471325134, 4502905.336279734],
[1630661.6992189789, 4502905.336279734]
]
]
]
}
}],
"type": "FeatureCollection",
"name": "geojson_name",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::3857"
}
}
},
"n_total_obj_prediction": 49
}
}
The duration_run field measure units is in seconds. The response body of an error case ("statusCode": 500):
{
"message": "500.1.a",
"duration_run": 0.003093040024396032,
"request_id": "c34f1f4f-abe4-49cd-b93f-671c8675f940",
"build_time": "20230114_235155"
}
{
"message": "500.1.a",
"duration_run": 0.003093040024396032,
"request_id": "c34f1f4f-abe4-49cd-b93f-671c8675f940",
"build_time": "20230114_235155"
}
It's possible to search for details within the serverless lambda logs (AWS CloudWatch) using the request_id field.
Missing features - TODO
- refactor 'get_geojson_reprojected' using 'get_prediction_georeferenced' (surferdtm-api)
- try to do ML roboflow/detectron2 inference directly in aws lambda graviton, see
- https://ryfeus.medium.com/machine-learning-inference-on-aws-lambda-functions-powered-by-aws-graviton2-processors-eae8de8f7043
- https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/aarch64-docker-images-for-tensorflow-and-pytorch
- https://github.com/aws/aws-graviton-getting-started/blob/main/containers.md
And also for suferdtm "reverse approach":
- analyze not the entire mountain polygon areas but only squares surrounding local peaks in DEM
Already DONE
- add docstrings documentation rendered with sphinx
- write tests for backend (surferdtm-api)
- add auth to lambda function
- display geojson polygons on web map
- georeferenced json response from roboflow
- request to roboflow
- georeferencing mask
- slope-dem-curv => rgb
- crop merged raster
- docker image with osgeo/gdal => aws lambda
- parallel requests with asyncio/aiohttp/aiofiles
- instance recognition with roboflow (detectron2?)
- images collection using
surfacedtmfor train an instance recognition detectron2 model - statistical model based on comparison between quantitative geomorphic features from mountains and volcanoes (roughness index, ...)
Technical stack and serverless architecture
I based this work on Mapzen Terrain Tiles because this data set contain both terrestrial and bathymetric elevation data.
The frontend is written in vue3 and deployed on netlify. To use some remote resources while avoiding CORS issues I use the netlify functions as a middleware.
The serverless backend is an AWS lambda function written in python, bundled in a docker image because of the dependencies bundle's size. To reduce the task complexity I based my container on the gdal's osgeo docker image (the ubuntu-small one).
As authentication layer I choose auth0; for data persistence I choose mongodb and to obtain machine learning inference I created an instance detection machine learning model on roboflow.