High-Performance, Cost-Free Servers

Crafting a 620KB Server Image

It should be straightforward for developers to write and deploy a server to clusters. However, that's only possible if you have the right infrastructure in place. But infrastructure comes at a cost.

Clusters can be expensive to run, with costs starting at $72 per month on services such as EKS (opens in a new tab) and GKE (opens in a new tab). ECS (opens in a new tab) could be an option if container deployment is needed. It is simple and doesn't require additional costs. However, for low-traffic services, having at least two instances per service and a load balancer may not be necessary and can be wasteful.

In this case, serverless solution like AWS Lambda (opens in a new tab) is suitable. Let's choose AWS Lambda. However, there might be a gray area when using a Lambda backend. If you write an application using full-stack frameworks like Remix (opens in a new tab) or Next (opens in a new tab), the code will eventually run on Lambda (opens in a new tab).

The biggest problem with Lambda is its performance. The Node.js runtime is heavy. Even though resources needed for rendering, including static pages, can be quickly retrieved from Cloudfront, the backend is not. Thus, if you hit a cold start, you have to wait helplessly.

As the backend and frontend Lambdas are bundled and deployed separately, calling the frontend Lambda will not affect the cold start time of the backend. As a result, users often have to wait too long even for a simple login process. Even for small services, you don't want to give users such an experience.

Ultimately, minimizing cold start times is crucial. However, languages such as JavaScript, Python, and Kotlin have their own runtimes, making them less suitable for this purpose. Rust and Go are the only options. Cold start times for both are roughly similar, but Rust is slightly faster, so we'll go with Rust.

Lambda

cargo-lambda (opens in a new tab) is a convenient tool for deploying simple Rust functions as Lambdas. It uploads the Rust binary to the Linux environment of Lambda, which is a powerful combination for quick implementation. However, we want database operations. The widely used ORM in the Rust ecosystem, diesel (opens in a new tab), requires native dependencies like libpq or libmysql. These dependencies are not included in the Lambda environment. So, we need to statically link these dependencies.

At this point, deploying with Docker is much more simpler. Furthermore, it is better to write a server that is as Lambda-agnostic as possible since the moment will come when using an actual server will become cheaper as the service grows. So let's choose docker deployment.

Let's take a look at what cargo-lambda does. cargo lambda new generates the following code:

use lambda_http::{run, service_fn, Body, Error, Request, RequestExt, Response};
 
async fn function_handler(_event: Request) -> Result<Response<Body>, Error> {
    let resp = Response::builder()
        .status(200)
        .header("content-type", "text/html")
        .body("Hello AWS Lambda HTTP request".into())
        .map_err(Box::new)?;
    Ok(resp)
}
 
#[tokio::main]
async fn main() -> Result<(), Error> {
    run(service_fn(function_handler)).await
}

Lambda is a service that processes events within the AWS ecosystem. Depending on the service that triggers Lambda, the type of event that Lambda receives (opens in a new tab) varies. We assume Api Gateway in our case. It could also be a load balancer or Function Url. These are the three sources from which Lambda can receive http requests. We do not need to worry about the shape of those events since lambda-http (opens in a new tab) transforms them into standard http::Request.

Once the necessary logic has been implemented in function_handler, we can deploy the Lambda with the cargo lambda deploy command. It couldn't be simpler. However, we want full web servers. Deploying those servers is known as the Lambda-Lith pattern which is generally avoided. If you want, you can split it up, but I'd choose k8s for that path.

Axum

Let's choose axum (opens in a new tab) as the web framework. It's a framework developed by the tokio (opens in a new tab) team and has a very lovely developer experience. However, since it hasn't been around for very long, there are relatively few examples, so keep that in mind. If it's inconvenient, you can also choose the traditional actix-web (opens in a new tab) or rocket (opens in a new tab). A axum server is typically written as follows:

use axum::{routing::get, Router};
 
#[tokio::main]
async fn main() {
    let app = Router::new().route("/", get(|| async { "Hello, World!" }));
 
    axum::Server::bind(&"0.0.0.0:8000".parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();
}

It's a typical server that waits on port 8000 and handles http requests. However, we want to handle incoming events as Lambda, not http requests. This is the moment where aforementioned preprocessing is needed. Implementing this logic as middleware makes sense. Since someone seems to have already implemented it (opens in a new tab) here, let's be lazy and use it.

use axum::{routing::get, Router};
 
#[tokio::main]
async fn main() {
    let app = Router::new().route("/", get(|| async { "Hello, World!" }));
 
    let app = tower::ServiceBuilder::new()
            .layer(axum_aws_lambda::LambdaLayer::default())
            .service(app);
 
    lambda_http::run(app).await.unwrap();
}

Use the LambdaLayer of axum_aws_lambda. Now this axum server will work as a great Lambda, but it cannot be used as a general web server. We hope for interoperability, and we need to develop locally for now, so it's better to implement it as follows.

use axum::{routing::get, Router};
 
#[tokio::main]
async fn main() {
    let app = Router::new().route("/", get(|| async { "Hello, World!" }));
 
    #[cfg(debug_assertions)]
    {
        axum::Server::bind(&"0.0.0.0:8000".parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();
    }
 
    #[cfg(not(debug_assertions))]
    {
        let app = tower::ServiceBuilder::new()
                .layer(axum_aws_lambda::LambdaLayer::default())
                .service(app);
 
        lambda_http::run(app).await.unwrap();
    }
}

In debug builds, it works as a general web server, and in release builds, it works as Lambda. If you want to get rid of the dependency on Lambda in the future, you can simply remove the branching.

Optimize

Now that we have written the server, let's try building it. Lambda will fetch the image from ECR and run it every time a fresh request comes in. The cold start time will be proportional to the size of the image. Therefore, we also need to control the size of the binary to be included in the image.

There is a famous document called min-sized-rust (opens in a new tab) which covers how to reduce the size of rust binary, and it's quite interesting. I often use the following settings.

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"

opt-level 3 is the optimization level that focuses on speed, where z is the one that focuses on size. The default optimization level for release builds is 3. In a situation like this where size means speed, I would choose z, but I don't think there would be a noticeable difference. The size of this binary is 1.3MB, which is great.

Containerize

Of course, this binary does not include necessary dependencies. As we mentioned, static linking is required for this binary to function standalone. Someone has already implemented this (opens in a new tab) again. Check out the helpful comments. It builds openssl and libpq and turns on the flag for static linking. Of course, we use musl here.

FROM registry.gitlab.com/rust_musl_docker/image:stable-latest AS builder
WORKDIR /app
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl

Now, if we add diesel, all necessary dependencies will be included in the binary. Note that there is a performance drop (opens in a new tab) when using musl, so keep this in consideration. However, as we mentioned, in Lambda, size is speed, so we have to live with it.

Now let's put it in a lightweight image. A safe choice is alpine, but even that is a luxury for our ephemeral Lambda. Let's choose scratch.

FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/rust-on-lambda /run
EXPOSE 8000
CMD ["./run"]

The size of the image has become 0.85MB. Considering that the size of a typical web server image can go up to several hundred MB, this is a really surprising result.

With upx (opens in a new tab), you can squeeze it even more. The size is 620KB when compressed. However, the overhead caused by upx will most likely increase the cold start time. Since the binary itself is small, this kind of optimization is not necessary. Nevertheless, it's always fun to experiment.

Anyway, now we're safe to say that there is almost no cold start caused by the size of the image size. It takes about 300ms to receive a response at the cold start. Since it takes about 200ms to open the server locally, it's acceptable. After that, it consistently takes about 40ms.

Note that the scratch image contains nothing but our binary. If there is any necessary information, you need to put it in yourself. For example, certificate information is required for https communication, and timezone information may also be needed. The complete Dockerfile looks something like this. If you feel like it's too much trouble, you can just use alpine and directly install those via apk.

FROM registry.gitlab.com/rust_musl_docker/image:stable-latest AS builder
WORKDIR /app
RUN DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    apt-get install -y ca-certificates tzdata upx
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl
RUN upx --lzma --best /app/target/x86_64-unknown-linux-musl/release/rust-on-lambda
 
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/rust-on-lambda /run
EXPOSE 8000
CMD ["./run"]

Conclusion

From now on, you just have to write the actual server. A simple CI pipeline looks like this (opens in a new tab). You could also deploy and manage it with terraform, but I personally think Lambda and terraform don't get along well. You can attach Cloudfront to the Function URL or with Api Gateway. As I have emphasized, if a cost crossover point happens, you can deploy it to real server anytime.

Of course, there are other infrastructures to consider when we talking about the cost. What about databases and caches? If you are going to use AWS managed services, you will need at least $20/month for each. I personally think RDS is worth paying even for small services, but using supabase (opens in a new tab) or planetscale (opens in a new tab) might be a solution. They offer surprisingly generous free-tier services.

In a similar context, there is a service called upstash (opens in a new tab) that handles cache, but there's no need to make a big fuss about it. I recommend using DynamoDB (opens in a new tab) for starters. It's easy to replace if you imitate the frequently used redis commands. I think this is the simplest stack nowadays for Scaling to zero experience, together with Vercel on the frontend side.

There are many limitations as well. For example, Lambda cannot fully utilize rust's powerful concurrency because it only handles one request at a time. Similarly, a connection pool does not work in this regime. It's a pity, but what can you do? That's just Lambda. It's not rust's or Lambda's fault. Using this infrastructure almost for free is a blessing of the cloud era.


Source code (opens in a new tab)