json-ext

json-ext

NPM version Build Status Coverage Status NPM Downloads

A set of utilities designed to extend JSON’s capabilities, especially for handling large JSON datasets (over 100MB) efficiently and streaming JSONL/NDJSON processing:

Key Features

Why json-ext?

Install

npm install @discoveryjs/json-ext

API

parseChunked()

Functions like JSON.parse(), iterating over chunks to reconstruct the result object, and returns a Promise.

function parseChunked(input: Iterable<Chunk> | AsyncIterable<Chunk>, reviver?: Reviver): Promise<any>;
function parseChunked(input: Iterable<Chunk> | AsyncIterable<Chunk>, options?: ParseOptions): Promise<any>;
function parseChunked(input: () => (Iterable<Chunk> | AsyncIterable<Chunk>), reviver?: Reviver): Promise<any>;
function parseChunked(input: () => (Iterable<Chunk> | AsyncIterable<Chunk>), options?: ParseOptions): Promise<any>;

type Chunk = string | Buffer | Uint8Array;
type Reviver = (this: any, key: string, value: any) => any;
type ParseOptions = {
    reviver?: Reviver;
    mode?: 'json' | 'jsonl' | 'auto';
    onRootValue?: (value: any, state: ParseChunkedState) => void;
    onChunk?: (chunkParsed: number, chunk: string | null, pending: string | null, state: ParseChunkedState) => void;
};
type ParseChunkedState = {
    mode: 'json' | 'jsonl';
    returnValue: any;
    currentRootValue: any;
    rootValuesCount: number;
    consumed: number;
    parsed: number;
};

Benchmark

Usage:

import { parseChunked } from '@discoveryjs/json-ext';

const data = await parseChunked(chunkEmitter);

Parameter chunkEmitter can be an iterable or async iterable that iterates over chunks, or a function returning such a value. A chunk can be a string, Uint8Array, or Node.js Buffer.

You can pass reviver either as the second argument (parseChunked(input, reviver)) or inside options (parseChunked(input, { mode, reviver })). reviver works the same way as in JSON.parse().

options.mode controls JSON Lines support:

options.onRootValue is called when a root value is parsed and finalized. When onRootValue is specified, parseChunked() resolves to the number of processed root values (instead of returning parsed value(s)), which allows processing huge or infinite streams without accumulating all values in memory.

options.onChunk is called after each input chunk is processed and once at the end with chunk = null. It provides parsing progress and parser state as chunks are processed.

The state object passed to onRootValue and onChunk callbacks has the following properties: - consumed – number of characters consumed so far - parsed – number of characters parsed so far (not necessarily the same when a chunk ends in the middle of a token) - mode – current parsing mode (json or jsonl) - rootValuesCount – number of root values parsed so far - currentRootValue – current root value being parsed - returnValue – current return value state, i.e. what parseChunked() will return when finished (either the parsed value or the number of root values, depending on whether onRootValue is specified)

Examples:

stringifyChunked()

Functions like JSON.stringify(), but returns a generator yielding strings instead of a single string.

Note: Returns "null" when JSON.stringify() returns undefined (since a chunk cannot be undefined).

function stringifyChunked(value: any, replacer?: Replacer, space?: Space): Generator<string, void, unknown>;
function stringifyChunked(value: any, options: StringifyOptions): Generator<string, void, unknown>;

type Replacer =
    | ((this: any, key: string, value: any) => any)
    | (string | number)[]
    | null;
type Space = string | number | null;
type StringifyOptions = {
    replacer?: Replacer;
    space?: Space;
    mode?: 'json' | 'jsonl';
    highWaterMark?: number;
};

Benchmark

Usage:

stringifyInfo()

export function stringifyInfo(value: any, replacer?: Replacer, space?: Space): StringifyInfoResult;
export function stringifyInfo(value: any, options?: StringifyInfoOptions): StringifyInfoResult;

type StringifyInfoOptions = {
    replacer?: Replacer;
    space?: Space;
    mode?: 'json' | 'jsonl';
    continueOnCircular?: boolean;
}
type StringifyInfoResult = {
    bytes: number;      // size of JSON in bytes
    spaceBytes: number; // size of white spaces in bytes (when space option used)
    circular: object[]; // list of circular references
};

Functions like JSON.stringify(), but returns an object with the expected overall size of the stringify operation and a list of circular references.

Example:

import { stringifyInfo } from '@discoveryjs/json-ext';

console.log(stringifyInfo({ test: true }, null, 4));
// {
//   bytes: 20,     // Buffer.byteLength('{\n    "test": true\n}')
//   spaceBytes: 7,
//   circular: []    
// }

Options

continueOnCircular

Type: Boolean
Default: false

Determines whether to continue collecting info for a value when a circular reference is found. Setting this option to true allows finding all circular references.

parseFromWebStream()

A helper function to consume JSON from a Web Stream. You can use parseChunked(stream) instead, but @@asyncIterator on ReadableStream has limited support in browsers (see ReadableStream compatibility table).

import { parseFromWebStream } from '@discoveryjs/json-ext';

const data = await parseFromWebStream(readableStream);
// equivalent to (when ReadableStream[@@asyncIterator] is supported):
// await parseChunked(readableStream);

createStringifyWebStream()

A helper function to convert stringifyChunked() into a ReadableStream (Web Stream). You can use ReadableStream.from() instead, but this method has limited support in browsers (see ReadableStream.from() compatibility table).

import { createStringifyWebStream } from '@discoveryjs/json-ext';

createStringifyWebStream({ test: true });
// equivalent to (when ReadableStream.from() is supported):
// ReadableStream.from(stringifyChunked({ test: true }))

License

MIT