About this guide

This guide covers:

  • Using Map/Reduce with Monger
  • Storing and loading JavaScript functions from classpath

This work is licensed under a Creative Commons Attribution 3.0 Unported License (including images & stylesheets). The source is available on Github.

What version of Monger does this guide cover?

This guide covers Monger 3.1 (including preview releases).

What version of MongoDB does this guide cover?

This guide covers MongoDB 2.0. Some features may be specific to MongoDB 2.2, but this guide tries to avoid them. If you are looking for edge Map/Reduce support documentation, please refer to the MongoDB guide on Map/Reduce.

Overview

Map/Reduce is a programming model for processing large data sets popularized by Google (see also Map/Reduce revisited).

Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is often the right tool in MongoDB.

In MongoDB, a Map/Reduce query consists of an input collection, a mapper function, a reducer function, a name of the output collection (where results will be inserted) and an output type that controls how Map/Reduce calculation results should be combined with the existing documents in the output collection. The mapper and reducer functions are in JavaScript and can be written inline (passed as strings) or read from classpath using a helper function.

Map/Reduce vs the Aggregation Framework

MongoDB 2.2 also supports a more focused, less generic and easier to use data processing feature called the Aggregation Framework which makes raw map/reduce a relatively low-level facility.

Performing MongoDB Map/Reduce queries with Clojure

monger.collection/map-reduce is the function used to run Map/Reduce queries with Monger. It takes a collection name, two JavaScript functions as strings (typically loaded from JVM classpath), a destination collection and one of the output type values (a com.mongodb.MapReduceCommand$OutputType instance):

(ns monger.docs.examples
  (:require [clojurewerkz.support.js :as js]
            [monger.core :as mg]
            [monger.collection :as mc]
            [monger.result :refer [acknowledged?]]
            [monger.conversion :refer [from-db-object]])
  (:import [com.mongodb MapReduceCommand$OutputType MapReduceOutput]))

;; performs a map/reduce query using functions stored in mapper.js and reducer.js
;; on the classpath. The result will be returned "inline" (as a collection of documents back to the client).
(let [conn    (mg/connect)
      db      (mg/get-db conn "monger-test")
      output  (mc/map-reduce db "events" (js/load-resource "mr/mapper.js")
                                         (js/load-resource "mr/reducer.js")
                                         "map_reduce_results"
                                         MapReduceCommand$OutputType/MERGE {})
      result  (from-db-object ^DBObject (.results ^MapReduceOutput output) true))]
  (println (acknowledged? output))
  (println result))

It is also possible to return results to the client (as "inline output"):

(ns monger.docs.examples
  (:require [clojurewerkz.support.js :as js]
            [monger.core :as mg]
            [monger.collection :as mc]
            [monger.result :refer [acknowledged?]]
            [monger.conversion :refer [from-db-object]])
  (:import [com.mongodb MapReduceCommand$OutputType MapReduceOutput]))

;; performs a map/reduce query using functions stored in mapper.js and reducer.js
;; on the classpath. The result will be returned "inline" (as a collection of documents back to the client).
(let [conn    (mg/connect)
      db      (mg/get-db conn "monger-test")
      output  (mc/map-reduce "events" (js/load-resource "mr/mapper.js") (js/load-resource "mr/reducer.js") nil MapReduceCommand$OutputType/INLINE {})
      result  (from-db-object ^DBObject (.results ^MapReduceOutput output) true))]
  (println (acknowledged? output))
  (println result))

Learn more about different MongoDB map/reduce output types.

The documentation is organized as a number of guides, covering all kinds of topics.

We recommend that you read the following guides first, if possible, in this order:

Tell Us What You Think!

Please take a moment to tell us what you think about this guide on Twitter or the Monger mailing list

Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.

comments powered by Disqus