Data Modeling with Kinvey

Kinvey provides a built-in data store, which is where your application can store all of its data. To make the most of the built-in data store, it helps to model your data well. In this guide, we'll discuss some of the core concepts and best practices for data modeling with the Kinvey data store.

Core Concepts

App-first Approach

When developing a new app, the tendency is to define the data model first, and then create the application objects and views to map to that data model. This approach tends to lead to data models that are overly-complex and not optimized for the mobile use case. A better approach is to start with designing your views, to determine what users need to see, and then your application data model objects. These objects should then be used to inform your data model.

Since an object can be thought of as a collection of properties (keys) that represent specific values, it makes sense to store this data on the backend as key-value pairs.

When storing key-value pairs the natural choice of database technology is NoSQL, as these databases are very efficient at storing key-value pairs. Kinvey uses a NoSQL data store for our backend, which gives us incredible performance, but it does come with some tradeoffs. The biggest tradeoff is that, since there are no JOINs in NoSQL, there are no inherent relationships between your Kinvey Collections.

You can still efficiently model relationships within your data, with just a shift to NoSQL patterns for data modeling, such as embedding and referring.

Relational vs. NoSQL Databases

Most people have heard about both Relational (or SQL) databases and NoSQL databases, but there’s still some confusion over the definitions of these databases.

  • A relational database is a database which is organized by tables of data which are defined by a schema compatible with the Relational Model. Developers use SQL to interact with the database.
  • A NoSQL database is a database where data is organized using a model other than the relational model. This includes Column based databases, Key-Value stores, Graph databases and document stores. Kinvey’s data store is based on MongoDB which is a NoSQL document store. Data on Kinvey is organized into collections which contain documents (your objects).

Normalization vs. Denormalization

To effectively model your data in Kinvey it’s useful to understand the difference between normalized data and denormalized data.

Normalized Data

Normalized data has a single copy of each entity which has a unique ID. To use an entity in multiple places you reference the ID of the entity instead of making a copy. If your data is normalized then you need to use references to link two entities.

Benefits of normalized data:

  • Data has a single canonical representation in the system.
  • Updating a single copy updates all references to that copy.
  • Logically separate entities are kept separate, keeping the data model cleaner.

Denormalized Data

Denormalized data has multiple copies of the same entity or record. To use an entity in multiple places you make a copy of the entity (or the properties of the entity that are required). When you embed one entity within another you are denormalizing your data.

Benefits of denormalized data:

  • In a NoSQL system denormalized data avoids a penalty for resolving a reference.
  • In a relational system you can optimize different copies of the same entity for different purposes. Such as having single columns from several tables placed in the same table to allow a fast index based query that can avoid JOINs.
  • In a NoSQL system reads and writes are fast enough to avoid performance penalties.

What Does My App Do?

The first step to modeling your data is looking at what your application does. You should ask yourself questions like:

  • Does my app collect large amounts of data from various sensors?
  • Does my app have social features linking users together?
  • Does my app need to have a really complex series of drill-down views?
  • Does my app have different users trying to update groups of items at the same time?
  • Does my app read data more (fetch from the server) or write data more (save to the server)?

These questions should drive the models that you use for your data. Thinking about how your users interact with data makes it really easy to choose an optimal way to organize your data. As you work more on your application the answers to these questions may change. Since Kinvey’s data store is schemaless, your data model can change as the answers to these questions change.

Modeling Relationships

Embedding

The Kinvey data store allows you to store very complex data as values. Since our client libraries are built on our REST service, which stores JSON objects, we can represent one-to-many relationships by having an array of key-values as a value in our JSON data (in the client libraries this would be an NSArray of NSDictionarys, an Array of HashMaps, or JavaScript objects). For example:

{"_id": "theArtist",
 "title": "The Artist",
 "year": 2011,
 "director": "Michel Hazanavicius",
 "awards": [{"event": "Academy Awards",
             "awards": [{"name": "Best Picture", "notes": ""},
                        {"name": "Best Director", "notes": "Hazanavicius"},
                        {"name": "Best Actor", "notes": "Dujardin"},
                        {"name": "Best Costume Design", "notes": ""},
                        {"name": "Best Original Score", "notes": ""}]},
            {"event": "Golden Globes",
             "awards": [{"name": "Best Motion Picture - Musical or Comedy", "notes": ""},
                        {"name": "Best Actor - Motion Picture Musical or Comedy", "notes": "Dujardin"},
                        {"name": "Best Musical Score", "notes": "Bource"}]}]}

In the above example, an artist can have many awards (1:n relationship). This is modeled by an awards array, each element containing an object that lists the event and the awards received at that event.

References

The other basic way to model relationships with your data in Kinvey is called references and uses the unique id field required by all entities that are saved to the appdata service. This is represented in the database by the property “_id”. If you save an entity to Kinvey and don’t define the value for “_id” we generate it for you. Using the “_id” field and our queries you can easily create relationships.

If you have entities, with one being the parent and one or more being the children, then you would fetch the parent entity from the server without resolving any child entities. When your app is ready to display or use the child entities, you would then perform a second query to fetch the children that you need. For example, if you’re building a directory app or org-chart app, you could do the following:

  • Fetch the root of the organization
  • Fetch the first level children of the root
  • When the user taps on a child, fetch all sub-children
  • Repeat

This technique really helps if there is a large hierarchy, as your app may run out of memory trying to resolve the entire structure in memory at a given time.

If you’re using the native libraries and want the objects in your app to appear to have the references resolved, you can use non-persistent properties to hold a pointer/reference to arrays of fetched entities. This also your app to treat data fetched with multiple queries as a single object graph without having to strip the references at save time.

Modeling One-to-many relationships

If you want to map a one-to-many relationship, let’s say between the “stories” collection and the “comments” collection (each story can have multiple comments), then just add a property in each comment entity called “storyId”. If we want to pull all comments for the story named “Dewey Defeats Truman” then we can use the following queries:

GET /appdata/[appKey]/stories/?query={"name":"Dewey Defeats Truman"}

[{"_id": "deweyDefeatsTruman",
"name": "Dewey Defeats Truman",
"reporter": "Arthur Sears Henning"}]

GET /appdata/[appKey]/comments/?query={"storyId":"deweyDefeatsTruman"}

[{"_id": "001",
 "storyId": "deweyDefeatsTruman",
 "username": "hstruman",
 "text": "This is one for the books"},
{"_id": "002",
 "storyId": "deweyDefeatsTruman",
 "username": "ChiTownTrib",
 "text": "Oops! We goofed!"},
{"_id": "003",
 "storyId": "deweyDefeatsTruman",
 "username": "NYDewey",
 "text": "Our mistake!"}]

Modeling Many-to-many relationships

To represent a many-to-many relationship in Kinvey we have several choices. For example, we could create a third collection to manage the relationship; however, this probably isn’t needed for most applications. Instead it’s probably easier to have several one-to-many models in a single collection. In other words, if your application maintains a table of patients, then to represent the many-to-many relationship of patients to doctors, have each patient store an array of “_id” values for the doctors they’ve seen. Then to get all patients seen by a specific doctor use:

// Find all Dr's named Joe at Mercy Hospital
GET /appdata/[appKey]/doctors/?query={"firstName": "Joe", "hospital": "Mercy"}

[{"_id": "998798ad987fe987e987bc98766",
  "firstName": "Joe",
  "lastName": "Brown",
  "hospital": "Mercy",
  "speciality": "oncology"},

 {"_id": "ca6775feba86554",
  "firstName": "Joe",
  "lastName": "Baker",
  "hospital": "Mercy",
  "speciality": "internal medicine"}]

// Find all patients seen by Dr. Joe Brown
GET /appdata/[appKey]/patients/?query={"doctors":"998798ad987fe987e987bc98766"}

[{"_id": "001",
 "firstName": "Waldo",
 "lastName": "Bond",
 "doctors": ["998798ad987fe987e987bc98766", "98698ad989d987f9987e"],
 "history": "..."},
{"_id": "002",
 "firstName": "Keith",
 "lastName": "Emerson",
 "doctors": ["998798ad987fe987e987bc98766", "82888277266ffaacafe", "ca6775feba86554"],
 "history": "..."}]


// Find Waldo's other Doctor
GET /appdata/[appKey]/doctors/98698ad989d987f9987e

{"_id": "98698ad989d987f9987e",
  "firstName": "Mary",
  "lastName": "Ng",
  "hospital": "Tufts Medical Center",
  "speciality": "internal medicine"}

To Embed or to refer

In general, embedded documents offer greater read performance and query performance for related documents at the cost of some write performance.

A general design guideline is that, in a NoSQL system, items that are displayed together should be embedded into the same entity. Master-detail views and drill down displays with limited nesting are also good candidates for embedding.

If you have a very complex relationship and arbitrary navigation in your app references will provide better performance.

If you’re building an app that requires an atomic access or updates to related entities, then you should embed one of the entities. There is no guarantee that the references will be resolved atomically.

Got a question?