Designing .NET Web APIs for Performance and Scale

Read this article on Medium.

Performance problems rarely come from the framework.

In production systems, slow APIs are usually caused by returning too much data, doing too much work synchronously, or asking the database to do things it was never designed to do efficiently.

After working on several high-traffic .NET applications, I’ve found that Web API performance boils down to a handful of fundamentals. Not clever tricks. Not premature optimization. Just good design choices applied consistently.

Let’s walk through what actually matters.

1. Optimize Database Queries

If your API is slow, start with the database. Before reaching for caching or scaling, make sure your queries are doing exactly what they need to — and nothing more.

Practical Database Optimization Techniques

Optimize queries and avoid unnecessary complexity
Avoid complex joins unless they are truly needed
Return only the columns you need — never SELECT *
Add indexes based on how queries are actually used

Indexes are powerful, but they’re not free. Over-indexing write-heavy tables can quickly become its own performance problem.

Example: Returning Only What You Need

Compare that to returning entire rows with dozens of unused columns. Less data means:

Faster reads
Less memory usage
Faster serialization

SELECT Id, FirstName, LastName, Email
FROM Users
WHERE IsActive = 1
ORDER BY LastName

Use Execution Plans (SQL Server)

When working with Microsoft SQL Server, one of the most effective tools is the Execution Plan in SQL Server Management Studio.

Execution plans help you identify:

Table scans
Missing or unused indexes
Expensive operators in a query

Before adding an index, fix the query. Often a small query rewrite delivers a bigger performance gain than any index ever will.

2. Minimize Response Data

The fastest API is the one that sends the least data. One of the most common mistakes in Web APIs is returning far more data than the client actually needs.

Paging Is Not Optional

Paging should be the default — not an afterthought.

In one application I worked on, a single API endpoint was expected to return up to 2,000 records. Even with a well-optimized query and proper indexing, performance was still an issue, because performance doesn’t stop at the database.

Even if the query is fast, the API still has to:

Materialize thousands of objects
Serialize them
Send them over the network

At that point, you’re no longer optimizing SQL — you’re fighting physics.

Example: Paged API Endpoint

Benefits of paging:

Protects your API
Protects your database
Protects your clients

[HttpGet]
public async Task<IActionResult> GetUsers(
    int page = 1,
    int pageSize = 50)
{
    var users = await _context.Users
        .Where(u => u.IsActive)
        .OrderBy(u => u.LastName)
        .Skip((page - 1) * pageSize)
        .Take(pageSize)
        .Select(u => new UserDto
        {
            Id = u.Id,
            FullName = $"{u.FirstName} {u.LastName}",
            Email = u.Email
        })
        .ToListAsync();

    return Ok(users);
}

Return Only What You Need

Use DTOs instead of returning full entities (lightweight, immutable records instead of full classes)
Filter and sort at the database level
Avoid loading navigation properties you don’t need

Doing less work will always outperform doing more work faster.

Example: Using DTOs instead of Entities

DTOs allow you to:

Control payload size
Avoid lazy-loading traps
Decouple your API from your data model

public record UserDto(
    int Id,
    string FullName,
    string Email
);

This simple practice prevents an entire category of performance issues.

Example: API Endpoint That Projects Directly to a DTO

Database Entity

Each property corresponds to a column in the User table in the database.
Id is typically the primary key.
The strings (FirstName, LastName, Email) map directly to VARCHAR / NVARCHAR columns.
= default! is a C# 8+ feature to disable nullable warnings, assuming the database column is non-nullable.
Roles might map to a many-to-many relationship (UserRoles join table).
LoginHistory might map to a one-to-many relationship.
These properties allow EF Core to automatically load related data if requested.

public class User
{
    public int Id { get; set; }
    public string FirstName { get; set; } = default!;
    public string LastName { get; set; } = default!;
    public string Email { get; set; } = default!;

    // Navigation properties you do NOT want to load
    public ICollection<Role> Roles { get; set; } = new List<Role>();
    public ICollection<LoginHistory> LoginHistory { get; set; } = new List<LoginHistory>();
}

Data Transformation Object (DTO)

A DTO (Data Transfer Object) is a class or record that defines exactly what data your API returns. Unlike entities, DTOs:

Do not map directly to the database
Contain only the fields that clients need
Are typically read-only
Avoid navigation properties unless explicitly needed

This helps:

Minimize payload size
Avoid accidental lazy-loading of related entities
Reduce coupling between database schema and API

public record UserDto(
    int Id,
    string FullName,
    string Email
);

❌ What to Avoid (Entity → DTO After Loading)

This looks harmless but causes over-fetching and possible lazy-loading:

var users = await _dbContext.Users
    .ToListAsync();

var result = users.Select(u => new UserDto(
    u.Id,
    $"{u.FirstName} {u.LastName}",
    u.Email
));

Issues with this approach:

Materializes full entities
Loads columns you don’t need
Risks navigation property access
More memory and CPU than necessary

✅ Correct Approach: Project Directly to the DTO

Benefits of this approach:

Only required columns are selected
No entity tracking overhead
No navigation properties loaded
Smaller allocations and faster serialization

EF Core translates this into a tight SQL projection, not a SELECT *.

[HttpGet]
public async Task<IReadOnlyList<UserDto>> GetUsers()
{
    return await _dbContext.Users
        .AsNoTracking()
        .OrderBy(u => u.LastName)
        .Select(u => new UserDto(
            u.Id,
            u.FirstName + " " + u.LastName,
            u.Email
        ))
        .ToListAsync();
}

3. Cache What You Can (But Be Careful)

Caching is one of the most powerful performance tools — but only when used intentionally.

Server-Side Caching Options

In-memory caching: Stored in application memory (e.g., IMemoryCache, Memcached)
Distributed caching: Stored externally (e.g., Redis). When your API runs on multiple instances, distributed caching (e.g., Redis) ensures all nodes share the same cached data.

If your API runs on multiple instances, distributed caching is usually the safer option.

Example: In-Memory Caching

Retrieves active users from the database, projecting them into lightweight UserDto objects and caching the result for five minutes to reduce repeated database queries and improve performance.

public async Task<IEnumerable<UserDto>> GetActiveUsersAsync()
{
    return await _cache.GetOrCreateAsync("active_users", async entry =>
    {
        entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5);

        return await _context.Users
            .Where(u => u.IsActive)
            .Select(u => new UserDto
            {
                Id = u.Id,
                FullName = $"{u.FirstName} {u.LastName}",
                Email = u.Email
            })
            .ToListAsync();
    });
}

Other Caching Layers

Front-end caching using HTTP cache headers
CDNs for static assets (Azure CDN, AWS CloudFront)

These layers reduce load before a request ever reaches your API.

4. Cache Invalidation (The Trade Off)

Caching is easy. Cache invalidation can be more complex.

Once data changes, you need to make sure stale data doesn’t stick around.

Common approaches include:

Time-based expiration: Let the cache invalidate itself after a fixed period The challenge is choosing the right expiration time
Programmatic invalidation: Remove cache entries when updates occur
The challenge is identifying every scenario where data should be invalidated

There’s no universal solution here — only tradeoffs.

Example: Programmatic Invalidation

This works well — but only if you identify every scenario where the cache should be invalidated.

_cache.Remove("active_users");

5. Use Asynchronous Processing for Long-Running Work

Not every request needs to finish its work before returning a response.

Asynchronous processing allows your API to:

Receive a request
Acknowledge it immediately
Process the work in the background

This keeps your API responsive and prevents clients from waiting on long-running tasks.

Common Approaches

Background jobs (Hangfire, Quartz.NET)
Message queues (Azure Service Bus, RabbitMQ, Apache Kafka)
Serverless workers (Azure Functions)

APIs don’t always need to do the work — sometimes they just need to start it.

Example: Background Job with Hangfire

Hangfire is ideal when you want reliable background processing without introducing additional infrastructure.

The API responds immediately (202 Accepted)
The request thread is released
The work runs reliably in the background
Jobs survive app restarts

API Controller — Accept and Return Immediately

[HttpPost("reports")]
public IActionResult GenerateReport([FromBody] ReportRequest request)
{
    BackgroundJob.Enqueue<IReportService>(
        service => service.GenerateReportAsync(request));

    return Accepted("Report generation has started.");
}

Background Job Service

public interface IReportService
{
    Task GenerateReportAsync(ReportRequest request);
}

public class ReportService : IReportService
{
    public async Task GenerateReportAsync(ReportRequest request)
    {
        // Simulate long-running work
        await Task.Delay(TimeSpan.FromSeconds(10));

        // Generate report, store file, send notification, etc.
    }
}

Example: Message Queue (Azure Service Bus / RabbitMQ)

Why This Scales Better

API and processing are fully decoupled
Consumers can scale independently
Natural retry and failure handling

API — Publish a Message

[HttpPost("orders")]
public async Task<IActionResult> CreateOrder([FromBody] OrderRequest request)
{
    await _messageBus.PublishAsync(new OrderCreatedMessage
    {
        OrderId = request.OrderId,
        CreatedAt = DateTime.UtcNow
    });

    return Accepted("Order received for processing.");
}

Message Consumer — Separate Worker Service

public class OrderCreatedHandler
{
    public async Task HandleAsync(OrderCreatedMessage message)
    {
        // Long-running processing
        await ProcessOrderAsync(message.OrderId);
    }
}

Example: Azure Functions as a Background Worker

This is perfect when you want zero infrastructure management.

Why This Is Powerful

Scales automatically
No background services to manage
Excellent for burst workloads

API — Send Message to Service Bus

[HttpPost("emails")]
public async Task<IActionResult> SendEmail([FromBody] EmailRequest request)
{
    await _serviceBusSender.SendMessageAsync(
        new ServiceBusMessage(JsonSerializer.Serialize(request)));

    return Accepted("Email queued for delivery.");
}

Azure Function — Process the Message

public class EmailProcessor
{
    [FunctionName("EmailProcessor")]
    public async Task Run(
        [ServiceBusTrigger("email-queue")] EmailRequest request)
    {
        await SendEmailAsync(request);
    }
}

Example: What Not to Do

Why?

Work can be killed if the app restarts
No retries
No observability
Not suitable for production APIs

Task.Run(() => DoWorkAsync());

6. Scale When Code Isn’t Enough

Eventually, you’ll hit a point where software optimizations alone aren’t sufficient.

That’s when scaling becomes necessary.

Load Balancing and Auto-Scaling

Run multiple API instances
Use a load balancer to distribute traffic
Enable auto-scaling to handle traffic spikes

Modern cloud platforms make this much easier than it used to be. Scale up during peak traffic, scale down when demand drops, and only pay for what you use.

Scaling won’t fix inefficient code — but it can significantly improve reliability and throughput once the fundamentals are in place.

Make your API Stateless (Required for Scaling)

❌ Example: What Breaks Scaling

Static or in-memory state ties requests to a single instance.

public class OrderController : ControllerBase
{
    private static List<Order> _orders = new();
}

✅ Use External State Instead

Externalize state to:

Databases
Distributed cache (Redis)
Message queues

public class OrderController : ControllerBase
{
    private readonly IDistributedCache _cache;

    public OrderController(IDistributedCache cache)
    {
        _cache = cache;
    }
}

Example: Distributed Cache Configuration (Scale Friendly)

When running multiple instances, in-memory cache is no longer enough.

Redis Configuration Example (.NET)

builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration.GetConnectionString("Redis");
    options.InstanceName = "ApiCache:";
});

Usage Remains the Same

await _cache.SetStringAsync(
    "active_users",
    JsonSerializer.Serialize(users),
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
    });

Multiple API instances now share the same cache.

Health Checks (Critical for Load Balancers)

Load balancers need a way to know if an instance is healthy.

builder.Services.AddHealthChecks()
    .AddSqlServer(builder.Configuration.GetConnectionString("Default"));

Map the Endpoint

Load balancers can now:

Route traffic only to healthy instances
Automatically remove failing nodes

app.MapHealthChecks("/health");

Example: Scale-Friendly Configuration (No Sticky Sessions)

Avoid relying on session affinity (“sticky sessions”).

❌Instead of this:

services.AddSession();

✅ Use External State Instead

JWT authentication
Distributed session stores (Redis)
Claims-based identity

builder.Services.AddAuthentication("Bearer")
    .AddJwtBearer();

This allows any instance to handle any request.

Example: Idempotency for Scaled APIs

With load balancing, retries happen.

Your API must handle duplicate requests safely.

Load balancers retry
Clients retry
Duplicate processing = bugs

[HttpPost("payments")]
public async Task<IActionResult> CreatePayment(
    [FromHeader(Name = "Idempotency-Key")] string key,
    PaymentRequest request)
{
    if (await _store.ExistsAsync(key))
        return Ok(await _store.GetResultAsync(key));

    var result = await ProcessPaymentAsync(request);

    await _store.SaveAsync(key, result);

    return Ok(result);
}

Example: Auto-Scaling Triggers (Conceptual)

Auto-scaling is usually triggered by:

CPU usage
Memory usage
Request count
Queue length

Your job as an API developer is to:

Keep requests fast
Avoid blocking threads
Push work to background processors

The platform handles the rest.

Web API Performance 101

When it comes down to it, Web API performance is about doing less work and doing it intentionally.

Optimize database queries
Minimize response data
Use paging
Cache intelligently
Offload long-running work
Scale when necessary

Performance isn’t about clever tricks or framework internals. It’s about understanding where the real costs are — and designing your APIs accordingly.