Understanding Projection

Deep dive into how customPicker works internally, performance characteristics, and design decisions.

What is DynamicPick?

DynamicPick is the process of selecting specific fields from a data structure while excluding others. It's a fundamental operation in data transformation that serves several purposes:

Security - Remove sensitive fields before sending data to clients
Performance - Transfer less data over the network
Privacy - Expose only what users are allowed to see
API Design - Define clear data contracts

The Problem

Imagine you have a user object from your database:

const dbUser = {
  id: 1,
  name: 'John Doe',
  email: 'john@example.com',
  password: '$2a$10$...', // ❌ Sensitive
  passwordHash: 'xyz', // ❌ Sensitive
  sessionToken: 'abc123', // ❌ Sensitive
  internalId: 'USR-XYZ-001', // ❌ Internal
  createdAt: '2024-01-01',
  lastLogin: '2024-01-15',
};

You want to send only safe fields to the client:

const apiUser = {
  id: 1,
  name: 'John Doe',
  email: 'john@example.com',
  createdAt: '2024-01-01',
};

Manual Solutions (Before customPicker)

Option 1: Manual Object Construction

const apiUser = {
  id: dbUser.id,
  name: dbUser.name,
  email: dbUser.email,
  createdAt: dbUser.createdAt,
};

✅ Simple
❌ Repetitive
❌ Error-prone (easy to forget fields)
❌ Hard to maintain

Option 2: Destructuring

const { password, passwordHash, sessionToken, internalId, ...apiUser } = dbUser;

✅ Concise
❌ Excludes instead of includes (risky - new fields auto-exposed)
❌ No type safety
❌ Doesn't work with nested objects

Option 3: Lodash pick

import _ from 'lodash';
const apiUser = _.pick(dbUser, ['id', 'name', 'email', 'createdAt']);

✅ Declarative
❌ No type safety
❌ No validation
❌ Doesn't handle nested paths well
❌ No caching (slower)

The customPicker Solution

import { customPicker } from '@noony-serverless/type-builder';

const apiUser = customPicker(dbUser, ['id', 'name', 'email', 'createdAt']);

✅ Declarative
✅ Type-safe
✅ Optional validation
✅ Handles nested objects and arrays
✅ Automatic schema caching
✅ 300,000+ ops/sec performance

Architecture Overview

High-Level Flow

Input Data + Projection Paths
         ↓
    Path Parser
         ↓
    Path Tree Builder
         ↓
    Schema Builder (Zod)
         ↓
    Schema Cache (LRU)
         ↓
    Zod Validation/Projection
         ↓
    Projected Output

Core Components

1. Path Parser

Converts string paths into structured segments:

'user.address.city'
  ↓
[
  { key: 'user', isArray: false },
  { key: 'address', isArray: false },
  { key: 'city', isArray: false }
]

'items[].id'
  ↓
[
  { key: 'items', isArray: true },
  { key: 'id', isArray: false }
]

Key Functions:

parsePath() - Parse single path into segments
buildPathTree() - Build nested tree from multiple paths
normalizePaths() - Deduplicate and sort paths for caching

2. Schema Builder

Converts path tree into Zod schema:

buildProjectionSchema(['name', 'email', 'address.city'])
  ↓
z.object({
  name: z.any().optional(),
  email: z.any().optional(),
  address: z.object({
    city: z.any().optional()
  }).optional()
})

Why Zod?

✅ Runtime validation
✅ Strip unknown fields automatically
✅ Composable schemas
✅ Great TypeScript integration
✅ Battle-tested library

3. Schema Cache

LRU cache for built schemas:

// First call: builds schema (~10μs)
customPicker(user, ['id', 'name', 'email']);

// Second call: uses cached schema (~3μs)
customPicker(user2, ['id', 'name', 'email']);

Cache Strategy:

LRU eviction - Least recently used schemas are evicted
Max size: 1000 - Prevents unbounded memory growth
Cache key - Normalized, sorted path string ('email|id|name')
Performance gain - ~70% faster on cache hits

How Path Parsing Works

Simple Paths

parsePath('name');
// [{ key: 'name', isArray: false }]

Nested Paths

parsePath('user.address.city');
// [
//   { key: 'user', isArray: false },
//   { key: 'address', isArray: false },
//   { key: 'city', isArray: false }
// ]

Array Paths

The [] suffix indicates an array field:

parsePath('items[]');
// [{ key: 'items', isArray: true }]

parsePath('items[].id');
// [
//   { key: 'items', isArray: true },
//   { key: 'id', isArray: false }
// ]

Deep Nested Arrays

parsePath('comments[].replies[].author.name');
// [
//   { key: 'comments', isArray: true },
//   { key: 'replies', isArray: true },
//   { key: 'author', isArray: false },
//   { key: 'name', isArray: false }
// ]

Path Tree Structure

Multiple paths are merged into a tree:

buildPathTree([
  'user.name',
  'user.email',
  'items[].id',
  'items[].name'
])

// Returns:
{
  user: {
    name: true,
    email: true
  },
  'items[]': {
    id: true,
    name: true
  }
}

Leaf nodes (true) indicate terminal fields. Branch nodes (objects) indicate nesting.

How Schema Building Works

Leaf Fields (Simple)

buildProjectionSchema(['name', 'email']);

// Generates:
z.object({
  name: z.any().optional(),
  email: z.any().optional(),
});

Why .optional()?

Allows missing fields (non-strict mode)
Can be made required with { strict: true }

Why z.any()?

We don't know the actual types from paths alone
Users can provide Zod schemas for type validation

Nested Objects

buildProjectionSchema(['user.name', 'user.email']);

// Generates:
z.object({
  user: z
    .object({
      name: z.any().optional(),
      email: z.any().optional(),
    })
    .optional(),
});

Arrays

buildProjectionSchema(['items[]']);

// Generates:
z.object({
  items: z.array(z.any()).optional(),
});

Array of Objects

buildProjectionSchema(['items[].id', 'items[].name']);

// Generates:
z.object({
  items: z
    .array(
      z.object({
        id: z.any().optional(),
        name: z.any().optional(),
      })
    )
    .optional(),
});

Deep Nesting

buildProjectionSchema(['comments[].text', 'comments[].author.name', 'comments[].replies[].text']);

// Generates:
z.object({
  comments: z
    .array(
      z.object({
        text: z.any().optional(),
        author: z
          .object({
            name: z.any().optional(),
          })
          .optional(),
        replies: z
          .array(
            z.object({
              text: z.any().optional(),
            })
          )
          .optional(),
      })
    )
    .optional(),
});

Caching Strategy

Cache Key Generation

Paths are normalized before caching:

// These all generate the same cache key:
['name', 'email', 'id'][('email', 'id', 'name')][('id', 'name', 'email')];

// Normalized: 'email|id|name' (sorted, deduplicated, joined)

Why normalize?

Order doesn't matter for projection
Maximizes cache hits
Prevents duplicate schemas

LRU Eviction

When cache size exceeds maxSize (default: 1000):

Oldest accessed schema is evicted
New schema is cached
Access order is updated

Example:

const cache = new SchemaCache(3); // Max 3 schemas

cache.set('a', schemaA); // Cache: [a]
cache.set('b', schemaB); // Cache: [a, b]
cache.set('c', schemaC); // Cache: [a, b, c]

cache.get('a'); // Access 'a' → Cache: [b, c, a]
cache.set('d', schemaD); // Evict 'b' → Cache: [c, a, d]

Cache Performance

Cache Hit (~3μs):

const toDTO = createPicker(['id', 'name', 'email']);

// First call: builds schema + caches
const user1 = toDTO(dbUser1); // ~10μs

// Subsequent calls: uses cache
const user2 = toDTO(dbUser2); // ~3μs ← 70% faster!
const user3 = toDTO(dbUser3); // ~3μs

Cache Miss (~10μs):

customPicker(user, ['different', 'fields']); // ~10μs (builds schema)

Performance Characteristics

Benchmark Results

Simple projection (cached):      ~300,000 ops/sec  (~3.3μs)
Nested projection (cached):      ~200,000 ops/sec  (~5μs)
Array projection (cached):       ~150,000 ops/sec  (~6.7μs)
First call (builds schema):      ~100,000 ops/sec  (~10μs)

Lodash pick:                     ~500,000 ops/sec  (~2μs)    ← No validation
Manual construction:             ~5,000,000 ops/sec (~0.2μs)  ← No safety

Why Not Faster?

Tradeoffs:

✅ Type safety
✅ Validation
✅ Nested path support
✅ Array handling
⚠️ Zod parsing overhead

Is 300k ops/sec slow? No! That's 3.3 microseconds per operation.

For context:

Database query: ~1-10ms (1000-10000μs)
HTTP request: ~10-100ms (10000-100000μs)
Projection: ~3μs

Optimization Tips

1. Use createPicker for repeated projections:

// ✅ GOOD: Pre-cache schema
const toDTO = createPicker(['id', 'name']);
users.map(toDTO);

// ❌ BAD: Rebuild schema every time
users.map((u) => customPicker(u, ['id', 'name']));

2. Disable validation if not needed:

// ✅ FAST: Skip validation (~2x faster)
customPicker(data, paths, { validate: false });

// ⚠️ SLOWER: Full validation
customPicker(data, paths, { validate: true });

3. Keep projections simple:

// ✅ FAST: Shallow projection
['id', 'name', 'email'][
  // ⚠️ SLOWER: Deep nesting
  'comments[].replies[].author.profile.settings.theme'
];

Design Decisions

Why Zod?

Considered Alternatives:

Manual object traversal - Fast but no validation
JSON Schema - Validation but complex API
Yup - Similar to Zod but less TypeScript support
Custom validator - More control but more code

Why Zod Won:

✅ Best TypeScript integration
✅ Composable schemas
✅ .strip() removes unknown fields automatically
✅ Battle-tested
✅ Already used in builder pattern

Why Auto-Caching?

Alternatives:

No caching - Simple but slow
Manual caching - Fast but complex API
Auto-caching - Best of both worlds

Benefits:

Users don't think about caching
70% performance improvement
Configurable ({ cache: false })

Why Path Strings?

Alternatives:

Object notation: { user: { name: true, email: true } }
Function chains: .pick('user').pick('name')
Path strings: ['user.name', 'user.email']

Why Path Strings Won:

✅ Concise
✅ Easy to serialize (URL params, JSON)
✅ Familiar (MongoDB, GraphQL)
✅ Easy to validate/whitelist

Why `[]` Array Syntax?

Alternatives:

.items - Ambiguous (field or array?)
.* - Unix-style but unclear
[] - Clear and familiar (ES6, TypeScript)

Benefits:

Clear intent (items[] = array)
Composable (items[].id)
Familiar syntax

Comparison with Alternatives

vs Lodash `pick`

Feature	customPicker	Lodash pick
Simple fields	✅	✅
Nested paths	✅ `'user.name'`	❌ Manual
Arrays	✅ `'items[].id'`	❌ Manual
Type safety	✅ TypeScript	❌ None
Validation	✅ Zod	❌ None
Caching	✅ Auto	❌ None
Performance	⚡⚡⚡ 300k ops/sec	⚡⚡⚡⚡ 500k ops/sec

When to use Lodash:

Simple, flat objects only
Performance critical (2x faster)
No validation needed

When to use customPicker:

Nested objects
Arrays
Validation required
Type safety important

vs Manual Construction

Feature	customPicker	Manual
Conciseness	✅	❌ Verbose
Maintainability	✅	❌ Error-prone
Performance	⚡⚡⚡ 300k ops/sec	⚡⚡⚡⚡⚡ 5M ops/sec
Type safety	✅	⚠️ Partial

When to use manual:

Ultra-high performance (hot paths)
Very simple objects (2-3 fields)

When to use customPicker:

Complex objects
Maintainability matters
Validation needed

vs Zod `.pick()`

Zod has a built-in .pick() method:

const UserSchema = z.object({
  id: z.number(),
  name: z.string(),
  email: z.string(),
  password: z.string(),
});

const PublicUserSchema = UserSchema.pick({ id: true, name: true, email: true });

Limitations:

❌ Requires pre-defined schema
❌ No path syntax ('user.name')
❌ No array syntax ('items[].id')
❌ More verbose

When to use Zod .pick():

You already have Zod schemas
Schema-first approach

When to use customPicker:

Data-first approach
Dynamic field selection
Nested/array projections

Common Patterns Explained

Pattern: Reusable Projections

const toPublicUser = createPicker(['id', 'name', 'email']);

What happens:

Schema built immediately: z.object({ id, name, email })
Schema cached with key 'email|id|name'
Returned function reuses cached schema

Why it's fast:

Schema built once
All subsequent calls use cache
No re-parsing paths

Pattern: Schema-based Projection

const PublicUserSchema = z.object({
  id: z.number(),
  name: z.string().min(1),
  email: z.string().email(),
});

customPicker(user, PublicUserSchema);

What happens:

Schema used directly (no path parsing)
Full Zod validation runs
Unknown fields stripped

When to use:

Validation required
Type guarantees needed
Schema already defined

Pattern: Shape-based Projection

const publicShape = { id: 0, name: '', email: '' };
projectByShape(user, publicShape);

What happens:

Keys extracted: ['id', 'name', 'email']
Paths normalized: 'email|id|name'
Schema built and cached
Projection applied

When to use:

Reference objects available
Better IDE autocomplete
Example-driven development

Memory Characteristics

Per-Operation Memory

customPicker:

Schema cached: ~200-500 bytes
Per-call overhead: ~50 bytes (object creation)
Result object: Size of projected data

Total: ~250-550 bytes per unique projection

Cache Memory

Default cache (1000 schemas):

Average schema: ~300 bytes
Total cache: ~300KB

Worst case (1000 complex schemas):

Large schema: ~500 bytes
Total cache: ~500KB

Memory is not a concern for most applications.

Clearing Cache

For long-running apps with dynamic projections:

// Periodic cleanup
setInterval(() => {
  const stats = getGlobalSchemaCacheStats();
  if (stats.size > 500) {
    clearGlobalSchemaCache();
  }
}, 60000); // Every minute

Edge Cases & Limitations

1. Circular References

Not handled. Circular data structures will cause stack overflow.

const user = { id: 1, name: 'John' };
user.self = user; // ❌ Circular

customPicker(user, ['id', 'name', 'self']); // ❌ Stack overflow

Solution: Don't project circular fields.

2. Very Deep Nesting (>10 levels)

Performance degrades with extreme nesting:

['a.b.c.d.e.f.g.h.i.j.k.l.m.n.o']; // ⚠️ Slow

Solution: Flatten data structures when possible.

3. Dynamic Keys

Object keys must be known at projection time:

const data = {
  user_1: { name: 'Alice' },
  user_2: { name: 'Bob' },
};

// ❌ Can't project unknown keys
customPicker(data, ['user_*.name']);

Solution: Use Object.values() or restructure data.

4. Function/Class Properties

Functions are stripped by Zod:

const obj = {
  id: 1,
  getName() {
    return 'John';
  },
};

customPicker(obj, ['id', 'getName']);
// Returns: { id: 1 } ← 'getName' stripped

Solution: Use class builder for method preservation.

Next Steps

🎯 Quick Start - Get started quickly
📖 How-to Guides - Solve specific problems
🔍 API Reference - Complete function reference

What is DynamicPick?​

The Problem​

Manual Solutions (Before customPicker)​

The customPicker Solution​

Architecture Overview​

High-Level Flow​

Core Components​

1. Path Parser​

2. Schema Builder​

3. Schema Cache​

How Path Parsing Works​

Simple Paths​

Nested Paths​

Array Paths​

Deep Nested Arrays​

Path Tree Structure​

How Schema Building Works​

Leaf Fields (Simple)​

Nested Objects​

Arrays​

Array of Objects​

Deep Nesting​

Caching Strategy​

Cache Key Generation​

LRU Eviction​

Cache Performance​

Performance Characteristics​

Benchmark Results​

Why Not Faster?​

Optimization Tips​

Design Decisions​

Why Zod?​

Why Auto-Caching?​

Why Path Strings?​

Why [] Array Syntax?​

Comparison with Alternatives​

vs Lodash pick​

vs Manual Construction​

vs Zod .pick()​

Common Patterns Explained​

Pattern: Reusable Projections​

Pattern: Schema-based Projection​

Pattern: Shape-based Projection​

Memory Characteristics​

Per-Operation Memory​

Cache Memory​

Clearing Cache​

Edge Cases & Limitations​

1. Circular References​

2. Very Deep Nesting (>10 levels)​

3. Dynamic Keys​

4. Function/Class Properties​

Next Steps​

What is DynamicPick?

The Problem

Manual Solutions (Before customPicker)

The customPicker Solution

Architecture Overview

High-Level Flow

Core Components

1. Path Parser

2. Schema Builder

3. Schema Cache

How Path Parsing Works

Simple Paths

Nested Paths

Array Paths

Deep Nested Arrays

Path Tree Structure

How Schema Building Works

Leaf Fields (Simple)

Nested Objects

Arrays

Array of Objects

Deep Nesting

Caching Strategy

Cache Key Generation

LRU Eviction

Cache Performance

Performance Characteristics

Benchmark Results

Why Not Faster?

Optimization Tips

Design Decisions

Why Zod?

Why Auto-Caching?

Why Path Strings?

Why `[]` Array Syntax?

Comparison with Alternatives

vs Lodash `pick`

vs Manual Construction

vs Zod `.pick()`

Common Patterns Explained

Pattern: Reusable Projections

Pattern: Schema-based Projection

Pattern: Shape-based Projection

Memory Characteristics

Per-Operation Memory

Cache Memory

Clearing Cache

Edge Cases & Limitations

1. Circular References

2. Very Deep Nesting (>10 levels)

3. Dynamic Keys

4. Function/Class Properties

Next Steps