Serverless Applications
Serverless architecture is a way to build and run applications and services without managing infrastructure. Your application still runs on servers, but all the server management is done by the cloud providers, such as AWS, Azure, or GCP.
What is AWS lambda?
AWS Lambda is a serverless computing service that can run code without worrying about provisioning or managing servers. You only pay for what you use and are not charged anything if your application is not being used. It was originally developed to run computationally less expensive workloads, but now, it can also be used for larger applications. It can scale pretty well and supports up to thousands of concurrent requests. A simple hello world lambda application looks as follows:
import json
def lambda_handler(event, context):
# TODO logic of the function
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
What is new with AWS Lambda?
Since December 2020, it has been possible now package and deploy Lambda functions as container images of up to 10 GB in storage size and six vCPUS. This means that large machine learning models can be deployed using a serverless approach that can be called in parallel to AWS lambda. In this repository, we deploy a BERT Question-Answering API in a serverless AWS Lambda environment. Therefore we use the Transformers library by HuggingFace, the Serverless Framework, AWS Lambda, and Amazon ECR.
What is the AWS API gateway?
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs are the "front door" for applications to access data, business logic, or functionality from your backend services. API gateway for Lambda basically exposes the lambda function to the internet. In our machine learning examples, Lambda is the inference server, and the API gateway is where the data is passed to the Lambda function. The following figure shows the architecture of how the API gateway will interact with the python function that runs in Lambda.
What is AWS DynamoDB?
Amazon DynamoDB is a fully managed, serverless, NoSQL database designed to run high-performance applications at any scale. If we want to save some data during the transaction of data from and to the API gateway, we can use DynamoDB. In the case of machine learning examples, we will store the logs on our inference server. The primary information that we are interested in is the data that was sent to the ML inference server and the output of the ML model. We can use this to monitor and evaluate the performance of the model. The database can be attached to any visualization tool to see the real-time performance of the model.
Project architecture
The project's serverless backend uses AWS lambda, DynamoDB, API gateway, and ECR for container registry. The Lambda function is the inference server wrapped in a docker image and uploaded to the AWS ECR. AWS API gateway sends POST request payloads to the lambda function. AWS DynamoDB is used to store the data sent to the inference server for monitory purposes. Lambda can talk to DynamoDB using an IAM role that only allows writing requests to the database.
How to use
The Infrastructure is deployed using serverless library. Serverless is an easy to use and secure framework to build applications on AWS Lambda and other serverless functions by other cloud providers. Install serverless using npm if you don't have it already. If you don't have npm in your machine, follow the installation instructions in here
If you want to start a project from zero, some templates can be used by runningserverless
in the terminal and follow the prompts. Here is an example of
the serverless configuration file to deploy a Lambda function with API gateway:
service: serverless-bert-qa-lambda-docker
provider:
name: aws # provider
region: us-east-1 # aws region
memorySize: 5120 # optional, in MB, default is 1024
timeout: 30 # optional, in seconds, default is 6
functions:
questionanswering:
image: ${ACOUNT_NUMBER}.dkr.ecr.us-east-1.amazonaws.com/bert-lambda:v1 #ecr url
events:
- http:
path: qa # http path
method: post # http method
shell script
serverless deploy
In the following few pages, we will show examples of using serverless architecture to deploy some of the commonly used NLP tasks. We will use state-of-the-art approaches using the transformers library.