Designing .NET Web APIs for Performance and Scale

Read this article on Medium.

Performance problems rarely come from the framework.

In production systems, slow APIs are usually caused by returning too much data, doing too much work synchronously, or asking the database to do things it was never designed to do efficiently.

After working on several high-traffic .NET applications, I’ve found that Web API performance boils down to a handful of fundamentals. Not clever tricks. Not premature optimization. Just good design choices applied consistently.

Let’s walk through what actually matters.

1. Optimize Database Queries

If your API is slow, start with the database. Before reaching for caching or scaling, make sure your queries are doing exactly what they need to — and nothing more.

Practical Database Optimization Techniques

  • Optimize queries and avoid unnecessary complexity
  • Avoid complex joins unless they are truly needed
  • Return only the columns you need — never SELECT *
  • Add indexes based on how queries are actually used

Indexes are powerful, but they’re not free. Over-indexing write-heavy tables can quickly become its own performance problem.

Example: Returning Only What You Need

Compare that to returning entire rows with dozens of unused columns. Less data means:

  • Faster reads
  • Less memory usage
  • Faster serialization
SELECT Id, FirstName, LastName, Email
FROM Users
WHERE IsActive = 1
ORDER BY LastName

Use Execution Plans (SQL Server)

When working with Microsoft SQL Server, one of the most effective tools is the Execution Plan in SQL Server Management Studio.

Execution plans help you identify:

  • Table scans
  • Missing or unused indexes
  • Expensive operators in a query

Before adding an index, fix the query. Often a small query rewrite delivers a bigger performance gain than any index ever will.

2. Minimize Response Data

The fastest API is the one that sends the least data. One of the most common mistakes in Web APIs is returning far more data than the client actually needs.

Paging Is Not Optional

Paging should be the default — not an afterthought.

In one application I worked on, a single API endpoint was expected to return up to 2,000 records. Even with a well-optimized query and proper indexing, performance was still an issue, because performance doesn’t stop at the database.

Even if the query is fast, the API still has to:

  • Materialize thousands of objects
  • Serialize them
  • Send them over the network

At that point, you’re no longer optimizing SQL — you’re fighting physics.

Example: Paged API Endpoint

Benefits of paging:

  • Protects your API
  • Protects your database
  • Protects your clients
[HttpGet]
public async Task<IActionResult> GetUsers(
int page = 1,
int pageSize = 50)
{
var users = await _context.Users
.Where(u => u.IsActive)
.OrderBy(u => u.LastName)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(u => new UserDto
{
Id = u.Id,
FullName = $"{u.FirstName} {u.LastName}",
Email = u.Email
})
.ToListAsync();

return Ok(users);
}

Return Only What You Need

  • Use DTOs instead of returning full entities (lightweight, immutable records instead of full classes)
  • Filter and sort at the database level
  • Avoid loading navigation properties you don’t need

Doing less work will always outperform doing more work faster.

Example: Using DTOs instead of Entities

DTOs allow you to:

  • Control payload size
  • Avoid lazy-loading traps
  • Decouple your API from your data model
public record UserDto(
int Id,
string FullName,
string Email
);

This simple practice prevents an entire category of performance issues.

Example: API Endpoint That Projects Directly to a DTO

Database Entity

  • Each property corresponds to a column in the User table in the database.
  • Id is typically the primary key.
  • The strings (FirstNameLastNameEmail) map directly to VARCHAR / NVARCHAR columns.
  • = default! is a C# 8+ feature to disable nullable warnings, assuming the database column is non-nullable.
  • Roles might map to a many-to-many relationship (UserRoles join table).
  • LoginHistory might map to a one-to-many relationship.
  • These properties allow EF Core to automatically load related data if requested.
public class User
{
public int Id { get; set; }
public string FirstName { get; set; } = default!;
public string LastName { get; set; } = default!;
public string Email { get; set; } = default!;

// Navigation properties you do NOT want to load
public ICollection<Role> Roles { get; set; } = new List<Role>();
public ICollection<LoginHistory> LoginHistory { get; set; } = new List<LoginHistory>();
}

Data Transformation Object (DTO)

DTO (Data Transfer Object) is a class or record that defines exactly what data your API returns. Unlike entities, DTOs:

  • Do not map directly to the database
  • Contain only the fields that clients need
  • Are typically read-only
  • Avoid navigation properties unless explicitly needed

This helps:

  • Minimize payload size
  • Avoid accidental lazy-loading of related entities
  • Reduce coupling between database schema and API
public record UserDto(
int Id,
string FullName,
string Email
);

❌ What to Avoid (Entity → DTO After Loading)

This looks harmless but causes over-fetching and possible lazy-loading:

var users = await _dbContext.Users
.ToListAsync();

var result = users.Select(u => new UserDto(
u.Id,
$"{u.FirstName} {u.LastName}",
u.Email
));

Issues with this approach:

  • Materializes full entities
  • Loads columns you don’t need
  • Risks navigation property access
  • More memory and CPU than necessary

✅ Correct Approach: Project Directly to the DTO

Benefits of this approach:

  • Only required columns are selected
  • No entity tracking overhead
  • No navigation properties loaded
  • Smaller allocations and faster serialization

EF Core translates this into a tight SQL projection, not a SELECT *.

[HttpGet]
public async Task<IReadOnlyList<UserDto>> GetUsers()
{
return await _dbContext.Users
.AsNoTracking()
.OrderBy(u => u.LastName)
.Select(u => new UserDto(
u.Id,
u.FirstName + " " + u.LastName,
u.Email
))
.ToListAsync();
}

3. Cache What You Can (But Be Careful)

Caching is one of the most powerful performance tools — but only when used intentionally.

Server-Side Caching Options

  • In-memory caching: Stored in application memory (e.g., IMemoryCache, Memcached)
  • Distributed caching: Stored externally (e.g., Redis). When your API runs on multiple instances, distributed caching (e.g., Redis) ensures all nodes share the same cached data.

If your API runs on multiple instances, distributed caching is usually the safer option.

Example: In-Memory Caching

Retrieves active users from the database, projecting them into lightweight UserDto objects and caching the result for five minutes to reduce repeated database queries and improve performance.

public async Task<IEnumerable<UserDto>> GetActiveUsersAsync()
{
return await _cache.GetOrCreateAsync("active_users", async entry =>
{
entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5);

return await _context.Users
.Where(u => u.IsActive)
.Select(u => new UserDto
{
Id = u.Id,
FullName = $"{u.FirstName} {u.LastName}",
Email = u.Email
})
.ToListAsync();
});
}

Other Caching Layers

  • Front-end caching using HTTP cache headers
  • CDNs for static assets (Azure CDN, AWS CloudFront)

These layers reduce load before a request ever reaches your API.

4. Cache Invalidation (The Trade Off)

Caching is easy. Cache invalidation can be more complex.

Once data changes, you need to make sure stale data doesn’t stick around.

Common approaches include:

  • Time-based expiration: Let the cache invalidate itself after a fixed period The challenge is choosing the right expiration time
  • Programmatic invalidation: Remove cache entries when updates occur
    The challenge is identifying every scenario where data should be invalidated

There’s no universal solution here — only tradeoffs.

Example: Programmatic Invalidation

This works well — but only if you identify every scenario where the cache should be invalidated.

_cache.Remove("active_users");

5. Use Asynchronous Processing for Long-Running Work

Not every request needs to finish its work before returning a response.

Asynchronous processing allows your API to:

  1. Receive a request
  2. Acknowledge it immediately
  3. Process the work in the background

This keeps your API responsive and prevents clients from waiting on long-running tasks.

Common Approaches

  • Background jobs (Hangfire, Quartz.NET)
  • Message queues (Azure Service Bus, RabbitMQ, Apache Kafka)
  • Serverless workers (Azure Functions)

APIs don’t always need to do the work — sometimes they just need to start it.

Example: Background Job with Hangfire

Hangfire is ideal when you want reliable background processing without introducing additional infrastructure.

  • The API responds immediately (202 Accepted)
  • The request thread is released
  • The work runs reliably in the background
  • Jobs survive app restarts

API Controller — Accept and Return Immediately

[HttpPost("reports")]
public IActionResult GenerateReport([FromBody] ReportRequest request)
{
BackgroundJob.Enqueue<IReportService>(
service => service.GenerateReportAsync(request));

return Accepted("Report generation has started.");
}

Background Job Service

public interface IReportService
{
Task GenerateReportAsync(ReportRequest request);
}

public class ReportService : IReportService
{
public async Task GenerateReportAsync(ReportRequest request)
{
// Simulate long-running work
await Task.Delay(TimeSpan.FromSeconds(10));

// Generate report, store file, send notification, etc.
}
}

Example: Message Queue (Azure Service Bus / RabbitMQ)

Why This Scales Better

  • API and processing are fully decoupled
  • Consumers can scale independently
  • Natural retry and failure handling

API — Publish a Message

[HttpPost("orders")]
public async Task<IActionResult> CreateOrder([FromBody] OrderRequest request)
{
await _messageBus.PublishAsync(new OrderCreatedMessage
{
OrderId = request.OrderId,
CreatedAt = DateTime.UtcNow
});

return Accepted("Order received for processing.");
}

Message Consumer — Separate Worker Service

public class OrderCreatedHandler
{
public async Task HandleAsync(OrderCreatedMessage message)
{
// Long-running processing
await ProcessOrderAsync(message.OrderId);
}
}

Example: Azure Functions as a Background Worker

This is perfect when you want zero infrastructure management.

Why This Is Powerful

  • Scales automatically
  • No background services to manage
  • Excellent for burst workloads

API — Send Message to Service Bus

[HttpPost("emails")]
public async Task<IActionResult> SendEmail([FromBody] EmailRequest request)
{
await _serviceBusSender.SendMessageAsync(
new ServiceBusMessage(JsonSerializer.Serialize(request)));

return Accepted("Email queued for delivery.");
}

Azure Function — Process the Message

public class EmailProcessor
{
[FunctionName("EmailProcessor")]
public async Task Run(
[ServiceBusTrigger("email-queue")] EmailRequest request)
{
await SendEmailAsync(request);
}
}

Example: What Not to Do

Why?

  • Work can be killed if the app restarts
  • No retries
  • No observability
  • Not suitable for production APIs
Task.Run(() => DoWorkAsync());

6. Scale When Code Isn’t Enough

Eventually, you’ll hit a point where software optimizations alone aren’t sufficient.

That’s when scaling becomes necessary.

Load Balancing and Auto-Scaling

  • Run multiple API instances
  • Use a load balancer to distribute traffic
  • Enable auto-scaling to handle traffic spikes

Modern cloud platforms make this much easier than it used to be. Scale up during peak traffic, scale down when demand drops, and only pay for what you use.

Scaling won’t fix inefficient code — but it can significantly improve reliability and throughput once the fundamentals are in place.

Make your API Stateless (Required for Scaling)

❌ Example: What Breaks Scaling

Static or in-memory state ties requests to a single instance.

public class OrderController : ControllerBase
{
private static List<Order> _orders = new();
}

✅ Use External State Instead

Externalize state to:

  • Databases
  • Distributed cache (Redis)
  • Message queues
public class OrderController : ControllerBase
{
private readonly IDistributedCache _cache;

public OrderController(IDistributedCache cache)
{
_cache = cache;
}
}

Example: Distributed Cache Configuration (Scale Friendly)

When running multiple instances, in-memory cache is no longer enough.

Redis Configuration Example (.NET)

builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("Redis");
options.InstanceName = "ApiCache:";
});

Usage Remains the Same

await _cache.SetStringAsync(
"active_users",
JsonSerializer.Serialize(users),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
});

Multiple API instances now share the same cache.

Health Checks (Critical for Load Balancers)

Load balancers need a way to know if an instance is healthy.

builder.Services.AddHealthChecks()
.AddSqlServer(builder.Configuration.GetConnectionString("Default"));

Map the Endpoint

Load balancers can now:

  • Route traffic only to healthy instances
  • Automatically remove failing nodes
app.MapHealthChecks("/health");

Example: Scale-Friendly Configuration (No Sticky Sessions)

Avoid relying on session affinity (“sticky sessions”).

❌Instead of this:

services.AddSession();

✅ Use External State Instead

  • JWT authentication
  • Distributed session stores (Redis)
  • Claims-based identity
builder.Services.AddAuthentication("Bearer")
.AddJwtBearer();

This allows any instance to handle any request.

Example: Idempotency for Scaled APIs

With load balancing, retries happen.

Your API must handle duplicate requests safely.

  • Load balancers retry
  • Clients retry
  • Duplicate processing = bugs
[HttpPost("payments")]
public async Task<IActionResult> CreatePayment(
[FromHeader(Name = "Idempotency-Key")] string key,
PaymentRequest request)
{
if (await _store.ExistsAsync(key))
return Ok(await _store.GetResultAsync(key));

var result = await ProcessPaymentAsync(request);

await _store.SaveAsync(key, result);

return Ok(result);
}

Example: Auto-Scaling Triggers (Conceptual)

Auto-scaling is usually triggered by:

  • CPU usage
  • Memory usage
  • Request count
  • Queue length

Your job as an API developer is to:

  • Keep requests fast
  • Avoid blocking threads
  • Push work to background processors

The platform handles the rest.

Web API Performance 101

When it comes down to it, Web API performance is about doing less work and doing it intentionally.

  • Optimize database queries
  • Minimize response data
  • Use paging
  • Cache intelligently
  • Offload long-running work
  • Scale when necessary

Performance isn’t about clever tricks or framework internals. It’s about understanding where the real costs are — and designing your APIs accordingly.