Lazy Javascript Iterables via Generators

October 6, 2020

Editor's regrets

There is a more common method in JavaScript of using generators to make powerful custom iterators than I use here. Not sure why, but I was attracted to another one which I felt was more explicit somehow, although a bit more verbose and still somewhat esoteric. The usual way people do it is:

const myIterator = (function*(){ yield 5; yield 2; yield* [1,2,3]})();

You'll see I do it without calling the generator explicitly. I have since reverted my preference to the most common method to be less clever. Either way, they are interchangeable so consider this method when reading what I do below.


A while back, I was concerned about the wastefulness of JavaScript's Array.prototype.map and Array.prototype.filter functions, really all of JavaScript's canonical functional sequence programming builtins. If you're unaware, these methods always allocate an entire new array. While this is fine in some cases, it enables several dumb performance pitfalls, and doesn't allow you to use these functions in performance-sensitive hotpaths of your application. Not to mention that these methods only exist on JavaScript's Array type, so you need to convert all iterators (e.g. Map, Set) into an array with Array.from, wasting more allocations. Here we'll build an efficient, elegant alternative using lazy evaluated generators, and at the end I provide a TypeScript implementation with heaps of fancy functional list operations, all efficient as heck. Plus, I eventually got around to publishing it on npm as lazy-from so you can use it and its 0 dependencies in your project with just npm install lazy-from.

Returning to the horror of JavaScript's original design, take for instance the following example:

([1,2,3]
	.map(x => x*3)
	.filter(x => x%2 == 0)
	.map(x => `${x}`)
	.concat([10, 11])
)

Simple enough, right? The result is ['6', 10, 11]. But in fact this code will allocate 4 distinct intermediate arrays, and in the worst case all of them will be the same size as the source array (if say, the filter didn't filter anything). If we were performing this on an array of 10 million records, we'd suddenly have to allocate 40 million records, 30 million of which we discard immediately. Perhaps an advanced optimizing JavaScript runtime, like Chromium/V8's TurboFan, will JIT-out the problem, but the standard library design is just poor in my opinion. There are lots of cases where your code will not have been churned through some optimizer and this bloat isn't hard to design out of the equation in the first place. Instead, let's implement our own similar functional list processing API with better performance thanks to lazy evaluation, and we'll do so with modern JavaScript generators.

A generator is a functional coroutine, effectively a function that can yield its ownership over the program execution flow, and confusingly also yield individual elements as its iterated over as an iterable. To make one in JavaScript, you use a function statement/expression with the * marker.

function* myGenerator() {
    yield 5
    yield 2
}

When myGenerator is called, it returns an iterator which on its next method, runs the underlying coroutine until it yields, which would be at the yield keyword. This way, callers can run their own code in between yielded elements, and even stop asking for more elements. With the Array.prototype.map function design, you need to allocate the entire array before you can iterate through it. This ability to not iterate until we need to, and even stop iteration is the concept of lazy evaluation that will free us from unnecessary allocations.

Let's combine our fancy generator with JavaScript's iterator interface to get an iterable literal:

my_iter = {
    *[Symbol.iterator]() {
        yield 1
        yield* [2,3]
    }
}

So we're declaring an object with the hidden Symbol.iterator property to show how to get an iterator of the object, and using the object generator property shorthand syntax to make it a generator. For completeness I also added an example of the yield* syntax which lets our generator yield from other iteratables. Now we can do something interesting with our generators.

function* map(iterable, mapFunction) {
    for (const item of iterable) yield mapFunction(item)
}

And that's basically it to map. We'll make the API more elegant later. We can now do the following:

[...map(map([1,2,3], x => x*3), x => `${x}`)]

You may be missing the trailing function syntax which is the main advantage to having map be a method of Arrays, but as I said we'll be making it elegant later. If you run this code in your local JavaScript runtime, be it browser or local, (I actually originally was inspired to write this code while using quickjs which doesn't optimize this stuff afaik) you'll notice that the spread syntax forces the iteration of the lazy iterable into an array, for us to view. When this code is running the spread syntax calls next on the outer map iterable. To get the first element, it calls next on the inner map. The inner map runs the loop, sets item to 1, then runs x => x*3 over it, yielding 3. The outer map sets its item to 3 after receiving it as the first element from inner map it is wrapping, and runs x => `${x}` yielding '3'. This repeats for all elements until the Array has pushed all 3 elements and is now ['3', '6', '9']. The point is, the instructions for the map calls are glued together when implemented via coroutine, as if you wrote only one map call practically. Now let's make filter.

function* filter(iterable, predicate) {
    for (const item of iterable)
        if (predicate(item))
            yield item
}

Again, easy. Now let's try to make these two functions as elegant as [1,2,3].map(x => x*4).filter(x => x < 10). The best way to do this, is be as close to the original API as possible. Let's take advantage of Array.from, and make our own Lazy.from method for all our lazy-evaluation needs. This way, we can have Lazy.prototype.map and Lazy.prototype.filter, and most code will just be a Lazy.from(...) away from making no copies.

class Lazy {
    constructor(iterable) {
		this.iterable = iterable
	}
    static from(iterable) {
        return new Lazy(iterable)
    }
    [Symbol.iterator]() {
        return this.iterable[Symbol.iterator]()
    }
}

Alright, so we can wrap iterables, but how can we map over them? We'll bring back our anonymous iterable to do so:

class Lazy {
	//...
    map(mapFunc) {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                for (const t of _this)
                    yield mapFunc(t)
            }
        })
    }
}

I like to let the code speak for itself normally but this can be a bit to parse. We return a new Lazy object, wrapping a new anonymous iterable mapping over this iterable. Since our Lazy object is an iterable (it implements Symbol.iterator) we just need to iterate over it. Unfortunately, in the anonymous object's method this would refer to the new anonymous object, not the original Lazy instance, so we create an alias to that this reference, _this, and reference it from our closure. This pattern is incredibly powerful, and we'll do filter pretty much the exact same way.

class Lazy {
	//...
    filter(predicate) {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                for (const t of _this)
					if(predicate(t))
						yield t
            }
        })
    }
}

Now we can expand our horizons and implement other array methods like concat, and forEach.

const isIterable = arg => typeof arg === "object" && Symbol.iterator in arg

class Lazy {
    concat(...args) {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                yield* _this
                for (const arg of args)
                    if (isIterable(arg)) yield* arg
                    else yield arg
            }
        })
    }
    forEach(func) {
		for (const t of this) func(t)
    }
}

And to get crazy, we can do some recursion with this technique and implement Array.prototype.flat

flat(depth=1) {
	const _this = this
	if (depth <= 0) return this
	else return Lazy.from({
		*[Symbol.iterator]() {
			for (const item of _this) {
				if (isIterable(item))
					yield* Lazy.from(item).flat(depth - 1)
				else yield item
			}
		}
	})
}

The morale of the story is once you get oriented, generators can make efficient, readable and effortlessly composable code, effectively reducing the sins of JavaScript. I think the 'anonymous iterable' idiom is a real gem in TypeScript, with generators also shining. Hopefully you decide to use something like this over lowering yourself to mutable Array.prototype.push in your performance-sensitive hotspots. Although I'm yet to [micro]benchmark the two.

As promised, below is a decently extensive Lazy implementation in TypeScript. As stated before, I published it to npm, although it could use a great deal of work. You can npm install lazy-from it or copy and paste into your TypeScript project.

// Typescript@4.0 probably simplifies or allows better alternative typings for
// some of these

/** return whether arg is T or an iterable of T */
function isIterable<T>(arg: T | Iterable<T>): arg is Iterable<T> {
    return typeof arg === "object" && Symbol.iterator in arg
}

/** iterable wrapper for functional programming with lazy composition */
export default class Lazy<T> implements Iterable<T> {
    static from<T>(iterable: Iterable<T>) {
        return new Lazy<T>(iterable)
    }

    public constructor(protected iterable: Iterable<T>) {}

    [Symbol.iterator](): Iterator<T> {
        return this.iterable[Symbol.iterator]()
    }

    public filter(predicate: (t: T) => boolean) {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                for (const t of _this)
                    if (predicate(t))
                        yield t
            }
        })
    }

    public map<U>(transform: (t: T) => U) {
        const _this = this
        return Lazy.from<U>({
            *[Symbol.iterator]() {
                for (const t of _this)
                    yield transform(t)
            }
        })
    }

    public flat(depth=1) {
        const _this = this
        if (depth <= 0) return this
        else return Lazy.from({
            *[Symbol.iterator]() {
                for (const item of _this) {
                    if (isIterable(item))
                        yield* Lazy.from(item).flat(depth - 1)
                    else yield item
                }
            }
        })
    }

    public concat(...args: (Iterable<T> | T)[] ): Lazy<T> {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                yield* _this
                for (const arg of args)
                    if (isIterable(arg)) yield* arg
                    else yield arg
            }
        })
    }

    public forEach(doSomething: (t: T) => void) {
        for (const item of this)
            doSomething(item)
    }

    public take(n: number): Lazy<T> {
        const _this = this
        return Lazy.from({
            *[Symbol.iterator]() {
                let i = 0
                for (const item of _this) {
                    if (!(i < n)) break
                    yield item
                    i++
                }
            }
        })
    }

    public reduce<Result>(callback: (prev: Result, curr: T, index: number) => Result, initial?: Result): Result {
        let result = initial
        let i = 0
        for (const curr of this) {
            result = callback(result, curr, i)
            i++
        }
        return result
    }

    public toSet(): Set<T> {
        const result = new Set<T>()
        for (const item of this) 
            result.add(item)
        return result
    }

    public some(predicate: (t: T) => boolean): boolean {
        for (const item of this)
            if (predicate(item)) return true
        return false
    }

    public every(predicate: (t: T) => boolean): boolean {
        return !this.some(t => !predicate(t))
    }
    
    public empty(): boolean {
        const item = this[Symbol.iterator]().next()
        return item.done
    }

    public sort(...[sortFunc]: Parameters<Array<T>["sort"]>) {
        return Lazy.from([...this].sort(sortFunc))
    }

    public get length() {
        let i = 0
        for (const item of this) i++
        return i
    }

    public includes(t: T) {
        for (const item of this) if (item === t) return true
        return false
    }

    public find(predicate: (t: T) => boolean) {
        for (const item of this) if (predicate(item)) return item
        return false
    }
}