Skip to main content

Batch Processing API

The Batch Processing API enables you to request data for large areas and longer time periods for any supported collection, including BYOC (Bring Your Own COG). It is typically more cost-effective when processing large amounts of data. For details, see Processing Units.

It is an asynchronous REST service, meaning data will not be returned immediately but delivered to your specified object storage instead.

Requires Enterprise License

The Batch Processing API is only available for users on enterprise plans. If you do not have an enterprise plan, and would like to try it out, contact us or upgrade.

Deployments

DeploymentAPI end-pointRegion
AWS EU (Frankfurt)https://services.sentinel-hub.com/api/v2/batcheu-central-1
AWS US (Oregon)https://services-uswest2.sentinel-hub.com/api/v2/batchus-west-2

Workflow

The Batch V2 Processing API comes with the set of REST APIs which support the execution of various workflows. The diagram below shows all possible statuses of a batch task:

  • CREATED
  • ANALYSING
  • ANALYSIS_DONE
  • PROCESSING
  • DONE
  • FAILED
  • STOPPED

and user's actions:

  • ANALYSE
  • START
  • STOP

which trigger transitions among them.

The workflow starts when a user posts a new batch request. In this step the system:

  • creates a new batch task with the status CREATED
  • validates the user's input (except the evalscript)
  • ensures the user's account has at least 1000 PUs
  • uploads a JSON of the original request to the user's bucket
  • and returns the overview of the created task

The user can then decide to either request an additional analysis of the task or start the processing. When an additional analysis is requested:

  • the status of the task changes to ANALYSING
  • the evalscript is validated
  • a feature manifest file is uploaded to the user's bucket
  • after the analysis is finished, the status of the task changes to ANALYSIS_DONE

If the user chooses to directly start processing, the system still executes the analysis but when the analysis is done it automatically proceeds with processing. This is not explicitly shown in the diagram in order to keep it simple.

When the user starts the processing:

  • the status of the task changes to PROCESSING (this may take a while, depending on the load on the service)
  • the processing starts
  • an execution database is periodically uploaded to the user's bucket
  • spent processing units are billed periodically

When the processing is finished, the status of the task changes to DONE.

Stopping the Request

A task might be stopped for the following reasons:

  • it is requested by a user (user action)
  • user is out of processing units
  • something is wrong with the processing of the task (for example, the system is not able to process the data)

A user may stop the request in following states: ANALYSING, ANALYSIS_DONE and PROCESSING. However:

  • if the status is ANALYSING, the analysis will complete
  • if the status is PROCESSING, all features (polygons) that have been processed or are being processed at that moment are charged for
  • user is not allowed to restart the task in the next 30 minutes

Input Features

BatchV2 API supports two ways of specifying the input features of your batch task:

  1. Pre-defined Tiling Grid
  2. User-defined GeoPackage

1. Tiling Grid

For more effective processing we divide the area of interest into tiles and process each tile separately. While process API uses grids which come together with each datasource for processing of the data, the batch API uses one of the predefined tiling grids. The tiling grids 0-2 are based on the Sentinel-2 tiling in WGS84/UTM projection with some adjustments:

  • The width and height of tiles in the original Sentinel 2 grid is 100 km. The width and height of tiles in our grids are given in the table below.
  • All redundant tiles (for example, fully overlapped tiles) are removed.

All available tiling grids can be requested with:

note

To run this example you need to first create an OAuth client as is explained here.

url = "https://services.sentinel-hub.com/api/v2/batch/tilinggrids/"

response = oauth.request("GET", url)

response.json()

This returns the list of available grids and information about tile size and available resolutions for each grid. Currently, available grids are:

nameidtile sizeresolutionscoverageoutput CRSdownload the grid [zip with shp file] *
UTM 20km grid020040 m10 m, 20 m, 60 mWorld, latitudes from -80.7° to 80.7°UTMUTM 20km grid
UTM 10km grid110000 m10 m, 20 mWorld, latitudes from -80.6° to 80.6°UTMUTM 10km grid
UTM 100km grid2100080 m60 m, 120 m, 240 m, 360 mWorld, latitudes from -81° to 81°UTMUTM 100km grid
WGS84 1 degree grid31 °0.0001°, 0.0002°World, all latitudesWGS84WGS84 1 degree grid
LAEA 100km grid6100000 m40 m, 50 m, 100 mEurope, including Turkey, Iceland, Svalbald, Azores, and Canary IslandsEPSG:3035LAEA 100km grid
LAEA 20km grid720000 m10 m, 20 mEurope, including Turkey, Iceland, Svalbald, Azores, and Canary IslandsEPSG:3035LAEA 20km grid

* The geometries of the tiles are reprojected to WGS84 for download. Because of this and other reasons the geometries of the output rasters may differ from the tile geometries provided here.

To use 20km grid with 60 m resolution, for example, specify id and resolution parameters of the tilingGrid object when creating a new batch request (see an example of full request) as:

{
...
"input": {
"type" : "tiling-grid",
"id": 0,
"resolution": 60.0
},
...
}

2. GeoPackage

In addition to the tiling grids, BatchV2 API now also support user-defined features through GeoPackages. This allows you to specify features of any shape as long as the underlying geometry is a POLYGON or MULTIPOLYGON in an EPSG compliant CRS listed here. The GeoPackage can also have multiple layers, offering more flexibility in specifying features in multiple CRS.

The GeoPackage must adhere to the GeoPackage spec and contain at least one feature table with any name. The table must include a column that holds the geometry data. This column can be named arbitrarily, but it must be listed as the geometry column in the gpkg_geometry_columns table. The table schema should include the following columns:

ColumnTypeExample
id - primary keyINTEGER (UNIQUE)1000
identifierTEXT (UNIQUE)FEATURE_NAME
geometryPOLYGON or MULTIPOLYGONFeature geometry representation in GeoPackage WKB format
widthINTEGER1000
heightINTEGER1000
resolutionREAL0.005

Caveats

  • You must specify either both width and height, or alternatively, specify resolution. If both values are provided, width and height will be used, and resolution will be ignored.
  • The feature table must use a CRS that is EPSG compliant.
  • identifier values must not be null and unique across all feature tables.
  • There can be a maximum of 700.000 features in the GeoPackage.
  • The feature output width and height cannot exceed 3500 by 3500 pixels or the equivalent in resolution.

Below you will find a list of example GeoPackages that serve as a showcase of how a GeoPackage file should be structured. Please note that these examples do not serve as production-ready GeoPackages and should only be used for testing purposes. If you would like to use these tiling grids for processing, use the equivalent tiling grid with the tiling grid input instead.

nameidoutput CRSgeopackage
UTM 20km grid0UTMUTM 20km grid
UTM 10km grid1UTMUTM 10km grid
UTM 100km grid2UTMUTM 100km grid
WGS84 1 degree grid3WGS84WGS84 1 degree grid
LAEA 100km grid6EPSG:3035LAEA 100km grid
LAEA 20km grid7EPSG:3035LAEA 20km grid

An example of a batch task with GeoPackage input is available here.

Area of Interest and PUs

When using either Tiling Grid or GeoPackage as input, the features that end up being processed are determined by the processRequest.input.bounds parameter specified in the request, called Area of Interest or AOI.

The way the AOI parameter is used and its effect depend on the input type used:

  • Tiling grid: The AOI must be specified in the request. Only the tiles (features) that intersect with the AOI will be processed.
  • GeoPackage: The AOI can optionally be omitted. If the AOI is omitted, all the features inside your GeoPackage will be processed. Conversely, if AOI is specified, only the features that intersect with the AOI will be processed.

Please note that in both cases of input types, if the feature is only partially covered by the AOI, the feature will be processed in its entirety.

You are only charged PUs for the features that are processed. If a feature does not intersect with the AOI, it will not be charged for.

Processing Results

The outputs of a batch task will be stored to your object storage in either:

  • GeoTIFF (and JSON for metadata) or
  • Zarr format

GeoTIFF Output Format

The GeoTIFF format will be used if your request includes the output.type parameter set to raster, along with other relevant parameters specified in the BatchV2 API reference. An example of a batch task with GeoTIFF output is available here.

By default, the results will be organized in sub-folders where one sub-folder will be created for each feature. Each sub-folder might contain one or more images depending on how many outputs were defined in the evalscript of the request. For example:

Batch Processing API Sub Folders

Batch Processing API Sub Folders

You can also customize the sub-folder structure and file naming as described in the delivery parameter under output in BatchV2 API reference.

You can choose to return your GeoTIFF files as Cloud Optimized GeoTIFF (COG), by setting the cogOutput parameter under output in your request as true. Several advanced COG options can be selected as well - read about the parameter in BatchV2 API reference.

The output projection depends on the selected input, either tiling grid or GeoPackage:

  1. If the input is a tiling grid, the results of batch processing will be in the projection of the selected tiling grid. For UTM-based grids, each part of the AOI (Area of Interest) is delivered in the UTM zone with which it intersects. In other words, in case your AOI intersects with more UTM zones, the results will be delivered as tiles in different UTM zones (and thus different CRSs).
  2. If the input is a GeoPackage, the results will be in the same CRS as the input feature's CRS.

Zarr Output Format

The Zarr format will be used if your request includes the output.type parameter set to zarr, along with other relevant parameters specified in the BatchV2 API reference. An example of a batch request with Zarr output is available here. Your request must only have one band per output and the application/json format in responses is not supported.

The outputs of batch processing will be stored as a single Zarr group containing one data array for each evalscript output and multiple coordinate arrays. The output will be stored in a subfolder named after the requestId that you pass to the API in the delivery URL parameter under output (for example, delivery.s3.url for AWS S3 or delivery.gs.url for Google Cloud Storage).

Ingesting Results into BYOC

Purpose

Enables automatic ingestion of processing results into a BYOC collection, allowing you to:

  • Access data with Processing API, by using the collection ID
  • Create a configuration with custom layers
  • Make OGC requests to a configuration
  • View data in EO Browser

In order to enable this functionality, user needs to specify either ID of an existing BYOC collection (collectionId) or set createCollection = true.

{
...
"output": {
...
"createCollection": true,
"collectionId": "<byoc-collection-id>",
...
},
...
}

If collectionId is provided, the existing collection will be used for data ingestion.

If createCollection is set to true and collectionId is not provided, a new BYOC collection will be created automatically and the collection bands will be set according to the request output responses definitions.

Regardless of whether the user specifies an existing collection or requests a new one, processed data will still be uploaded to the user's object storage bucket (S3 or Google Cloud Storage), where they will be available for download and analysis.

When creating a new batch collection, one has to be careful to:

  • Make sure that cogOutput=true and that the output format is a image/tiff
  • If an existing BYOC collection is used, make sure that identifier and sampleType from the output definition(s) match the name and the type of the BYOC band(s). Single band and multi-band outputs are supported.
  • If multi-band output is used in the request, the additionally generated bands will be named using a numerical suffix in ascending order (for example, 2, ... 99). For example, if the output: { id: "result", bands: 3 } is used in the evalscript setup function, the produced BYOC bands will be named: result for band 1, result2 for band 2 and result3 for band 3. Make sure that no other output band has any of these automatically generated names, as this will throw an error during the analysis phase. The output: [{ id: "result", bands: 3 },{ id: "result2", bands: 1 }] will throw an exception.
  • Keep sampleType in mind, as the values the evalscript returns when creating a collection will be the values available when making a request to access it.

Mandatory AWS S3 bucket settings

Regardless of the credentials provided in the request, you still need to set an AWS S3 bucket policy to allow services to access the data. For detailed instructions on how to configure your S3 bucket policy, please refer to the BYOC bucket settings documentation.

Using AWS S3 Delivery Buckets from Other Regions

An AWS S3 bucket from an arbitrary region can be used for data delivery. If the S3 bucket region differs from the system region where the request is sent to, the bucket region also needs to be defined in the request:

{
...
"output": {
...
"delivery": {
"s3": {
"url": "s3://<your-bucket>/<requestId>",
"region": "<bucket-region>",
...
}
},
...
},
...
}

In this case an additional cost of 0.03 PU per MB of transferred data will be added to the total processing cost. Ingesting results into BYOC is not possible when the system region differs from the delivery bucket region.

Feature Manifest

Purpose

  • Provides a detailed overview of features scheduled for processing during the PROCESSING step.
  • Enables users to verify feature information and corresponding output paths prior to processing.

Key information

  • File Type: GeoPackage
  • File Name: featureManifest-<requestId>.gpkg
  • Location: Root folder of the specified output delivery path
  • Structure:
    • May contain multiple feature tables, one per distinct CRS used by the features.
    • Table names follow the format feature_<crs-id> (for example. feature_4326).

During task analysis, the system uploads a file to the user's bucket called the featureManifest-<requestId>.gpkg. This file is a GeoPackage that contains basic information about the features that will be processed during the PROCESSING step. It is intended to be used by users to check the features that will be processed and their corresponding output paths.

If the output type is set to raster, the output paths will be the paths to the GeoTIFF files. If the output type is zarr, the output paths will just be the root of the output folder.

The database may contain multiple feature tables; one feature table for each CRS of all features. The tables will be named feature_<crs-id>, for example, feature_4326.

The schema of feature tables inside the database is currently the following:

NameTypeDescription
fidINTEGERAuto-incrementing ID
outputIdTEXTOutput identifier defined in the processRequest
identifierTEXTID of the feature
pathTEXTThe object storage path URI where the output of this feature will be uploaded to
widthINTEGERWidth of the feature in pixels
heightINTEGERHeight of the feature in pixels
geometryGEOMETRYFeature geometry representation in GeoPackage WKB format

Execution Database

Purpose

The Execution Database serves as a monitoring tool for tracking the progress of feature execution within a specific task. It provides users with insight into the status of each feature being processed.

Key Information

  • File Type: SQLite
  • File Name: execution-<requestId>.sqlite
  • Location: Root folder of specified output delivery path
  • Structure:
    • Contains a single table called features.

You can monitor the execution of your features for a specific task by checking the SQLite database that is uploaded to your bucket. The database contains the name and status of each feature. The database is updated periodically during the execution of the task.

The database can be found in your bucket in the root output folder and is named execution-<requestId>.sqlite.

The schema of the features table is currently the following:

NameTypeDescription
idINTEGERNumerical ID of the feature
nameTEXTTextual ID of the feature
statusTEXTStatus of the feature
errorTEXTError message in case processing has failed
deliveredBOOLEANTrue if output delivered to delivery bucket, otherwise False

The status of the feature can be one of the following:

  • PENDING: The feature is waiting to be processed.
  • DONE: Feature was successfully processed.
    Caveat: If there was no data to process for this feature, the feature will still be marked with status DONE but with a 'No data' message in the error column.
  • FATAL: Feature has failed X amount of times and will not be retried. The error column details the issue.

AWS Bucket Access

The BatchV2 API requires access to your AWS bucket in order to:

The IAM user or IAM role (depending on which of access methods described below is used) must have permissions to read and/or write to the corresponding S3 bucket.

We provide 2 ways of providing access to your bucket:

  1. AWS IAM Assume Role Workflow
  2. AWS Access Key & Secret Key Workflow

AWS IAM Assume Role Workflow

In order to let IAM access the bucket, you can provide ARN of your IAM role that has access to the bucket. This method is recommended as it is more secure and allows for more fine-grained control over the access permissions.

You can do this by creating a new IAM role in your AWS account with the necessary permissions to access your bucket and adding our IAM user as a trusted entity that can perform the sts:AssumeRole action.

Step by step guide on how to setup your IAM role & policies:

  1. Create an IAM Policy for limited access to your bucket
  • First, we will create a policy that grants access to your bucket. This policy will later be attached to the IAM role.
  • Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  • In the navigation pane, choose Policies and then choose Create policy.
  • Open the JSON tab.
  • Enter a policy that grants GetObject, PutObject, and ListBucket permissions to your bucket. Here is an example policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::<your-bucket-name>",
"arn:aws:s3:::<your-bucket-name>/*"
]
}
]
}
  • Replace <your-bucket-name> with the name of your S3 bucket & click Next.
  • On the Review and create page, enter a Policy name and optionally fill in a Description and tags for the policy, and then click Create policy.
  1. Create an IAM Role
  • In the navigation pane, choose Roles and then choose Create role.
  • Choose AWS account for the trusted entity type and then choose Another AWS account for the role type.
  • For Account ID, enter 614251495211 (this is the AWS account ID for).
  • Leave the Require external ID and Require MFA boxes unchecked. We will come back to fine-tuning the trust relationship later.
  • Click Next.
  • In the Permissions policies page, select the policy you just created & click Next.
  • On the review page, enter a Role name and optionally fill in a Description and tags for the role, and then click Create role.
  1. Adjusting the Trust Relationship
  • If you wish to further limit access to the role, you can modify the trust relationship. If not, you can skip this step.
  • After the role is created, it will appear in the list of roles in the IAM console.
  • Choose the role that you just created.
  • Navigate to the Trust relationships tab and then select Edit trust policy.
  • For an extra layer of security, you can specify the sts:ExternalId parameter. If you choose to use this, set its value to your domain account ID, which can be found in the User settings page in the Dashboard.
  • If your IAM role is shared among several principals and you want to distinguish their activities, you can set the sts:RoleSessionName in the trust policy of each principal. For the principal, set its value to sentinelhub.
  • Here's an example of how the JSON might look like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<your-SH-domain-account-id>"
},
"StringLike": {
"sts:RoleSessionName": "sentinelhub"
}
}
}
]
}
  • Replace <your-SH-domain-account-id> with your domain account ID.
  • Click Update policy.

Now, you can use the ARN of this IAM role in your Batch API requests by simply providing the iamRoleARN alongside the URL of your bucket object:

s3 = {
"url": "s3://<your-bucket>/<path>",
"iamRoleARN": "<your-IAM-role-ARN>",
}

AWS Access Key & Secret Key Workflow

The other option is to provide accessKey and secretAccessKey pairs in your request.

s3 = {
"url": "s3://<your-bucket>/<path>",
"accessKey": "<your-bucket-access-key>",
"secretAccessKey": "<your-bucket-secret-access-key>"
}

Access key and secret must be linked to an IAM user that has permissions to read and/or write to the corresponding S3 bucket.

To learn how to configure an access key and access key secret on AWS S3, see the Programmatic access section here.