September 15, 2013

ACID compliance with Live, Non-SQL, Eventually Consistent DB's

One of the main complains I hear all around about Non-SQL is the ACID compliance and developers making crazy "fixes" to handle eventual data consistency, as I mentioned on this previous post I feel this is mostly because developers are trying to emulate the SQL ways, also they are used to be told how to handle everything, but in the new Non-SQL world, it all boils down to the developers creativity and mostly, its ability to find a syncretism between different approaches and platforms.

Non-SQL and eventual consistency requires a lot of extra creativity, but also gives lots of extra flexibility and opens the path to new and smarter software.

SQL likes to put a lot of limitations and rules for the developer and, in most cases, to the user, specially when it comes to data formats and relations, in Non-SQL the JSON format it's usually the way to go, and JSON doesn't care if your data is a Boolean, string, number, URL... so long it's text and you pack it in a {'key':'value'} format, he's fine with it, also, it allows for different kinds of data to be treated as equal, this makes easy to create a very powerful "Universal Searchbox", programmatic backups and clean-ups are easy and transparent, and you can join all this backups and use them to make awesome reports, by using each backups data as a point in time, and drawing flow charts... Neat huh? but unlike SQL, you won't find much guidelines to tell you how to face this concepts, so you really need to know how to achieve all this in a very precise and creative way... or things can go pretty nasty.

Let's see how we can face ACID compliance first:

Atomicity: The "all or nothing" rule. This is important, but instead of thinking of the whole point-to-point process as a single atomic transaction, here you must "split" it in different transactions, the most critical part is usually the Live Data, I personally like Firebase for this, so the most critical atomicity is between the client's AngularJS (or AngularFire) and the Firebase servers and this is already solved in AngularJS, also there is a lot of people constantly checking and upgrading this. The servlets  are atomic too, and its easy to program them to push a new line with the details of the error in a special log, but even if there is a mayor problem in the servlet or servlets, the Firebase should stay up and running, so the critial parts of the system keeps working as normally as possible. So I recommend always aiming for "isolated atomicity".

Consistency: Only valid data must be written, but again, one must be more specific about what is "valid data", in SQL if you send a number instead of a string, hell unleashes... or even if your string is 1 character too large (or even too short) it just won't work, in the paper this sounds like a way to prevent problems (for the IT staff) but sending extra complexity to the user for the programmers convenience doesn't seem right anymore. By using JSON your consistency focus only that the data is represented correctly ( Eg. [{"key 1":"value 1"},{"key 2":"value 2"}] ). but it doesn't worry about the content's, this makes everything much more flexible.

Isolation: Ideally one will be using a modular approach when working with Non-SQL, usually involving different platforms, so you should not only isolate the transactions, but actually split everything in parts for the eventual data consistency. Usually this can be done by splitting your system in small Live DB's making geographical location the main focus, this usually prevents for 2 people working on the same thing at the same time.

Durability: One very important thing about DB's is backing up your data, ideally you wish to take automatic periodical snapshots of your DB's, (this is very easy with Firebase and Google Drive using a GAS servlet by the way) and once you have periodic snapshots, wouldn't it be nice to put the all togheter, and see how the data changes through time? This can be done with BigQuery, Datastore and a GAS servlet to import the data from Firebase, usually you will add the "snapshot" as a new bucket in Datastore at the same time you make a historical backup on Drive, but you can adjust this according to your needs and budget, to lower prices, you can backup once a week on Drive and once a month on Datastore (for BigQuery).

Now let's think of a crazy example to give you an idea of how you can put together an imaginary software to handle a few warehouses, since this is Non-SQL, this model can be scaled up or down as needed, and new "modules" can be added or the existing ones can be adjusted or replaced by a different approach without problem.

I really recommend an MVC data architecture, so you can separate the UI from the backend and the controllers, this will allow you to focus on one problem at a time, also will simplify future updates, fixes and additions.

In this imaginary example we are going to be using 3 different HTML5 web apps:
  • Warehouse Administration: This should allow each warehouse to handle all it's contents in real-time so all the employees are aware of were everything is and notice any change instantly, if something comes in, out or change it's place in the warehouse this changes are instantly reflected on the view of all the warehouse personal, this way, no matter how fast things change and how many tasks are happening in parallel, everyone's work will keep in sync.
  • Administrative Central Office: This is the software for the staff who take part of the business decisions, they do not need to know the whereabouts of each screw in the warehouse in real-time, but rather the statistical data that comes out of the warehouse operations, mostly product sales performance, personnel efficiency and all sorts of statistical data analysis, so this part of the web app should focus on transforming many (possibly large) historical files into easy to understand charts and reports.
  • IT support: This department should be able to access all the DB instances, the backups and Webapp code and also be able to make changes on any part of the system fast and as transparent to the user as possible, also they should be able to communicate with all the staff to provide assistance and the system should detect, file and report in real-time any problem in the system.
Now let's elaborate more on the characteristics of the system:
  • All the people on the same warehouse, should have instant, real-time access to that specific warehouse data to keep things agile, even when many transactions are happening at the same time.
  • The warehouse staff must be able to quickly find any specif object or group of objects stored in the warehouse, by providing any arbitrary property (name, bar-code, serial, size, position, etc) and the system must show the matching object(s) instantly and with live-data.
  • The people of one warehouse doesn't need to know all the time what's happening at the other warehouses.
  • Every movement inside the warehouse must be logged, things like what was the change, who did it, why and when must be always be properly logged.
There must be the following chats:
  • For each warehouse, the administrative staff and the IT support staff.
  • Shared across all the warehouses.
  • Between the administrative staff and each warehouse, separately.
  • Between the administrative staff and all the warehouse.
  • Between each warehouse.
  • Between anyone and the IT support staff.
There must be the following logs:
  • When something it's moved inside the warehouse.
  • When something enters the warehouse.
  • When something leaves the warehouse.
  • When there is a system error.
  • When someone requests assistance from the IT staff.
  • All the personnel's everyday check-in and check-out.
  • Every activity from each member of the warehouse staff.
Before going into the implementations details, I made this simplified data flow to give you a better picture, note that not all the links are displayed and the solutions proposed here are only an imaginary example, I strongly encourage you to experiment and adapt the general idea to best fit your needs, budget and style.

Let's browse this from left to right, okay? here we go:

  1. First we have the HTML5 app used by the warehouse staff, this is were the real action happens, here the employees from the warehouses receives they're orders, find things and make reports about every move in the merchandise.
  2. To handle this we are using 3 Firebase DBs, one for each warehouse, this isolates the data and allows for precise tracking on each warehouse activity (Firebase includes an Analytics service). To connect Firebase with HTML5 I suggest AngularJS (AngularFire). Note: A single Firebase DB could handle all you need, but we split it for isolation, to balance the Quota and keep the costs minimum. Also there is a 10mb limit for exporting/importing data over most platforms, so splitting and periodically backing up and cleaning the Firebase DB's (always in that order) it's important. If 10mb it's too small for you, you will need to "zonify" your DB's, split the Firebase DB and back them up separately, Eg. [{"Zone 1":{},"Zone 2":{}}]. Since this is done by GAS servlets and not by the HTML5 app or Firebase itself, you can make changes to this as much as needed, usually without "downtime" and we always have the option to turn the servlet back to a previous version.
  3. We have an additional Firebase DB to handle the chats, you can handle as many chats as you need using a single Firebase DB, it is not on the diagram, but you may wish to add an extra GAS servlet to log and erase (from Firebase) the chats, usually at the end of the day.
  4. Then we have 2 GAS JSON servlets that uses GAS URL fetch to get backups from Firebase and pass the data to Drive and Datastore using Firebase REST API, after backing up it's time to call another servlet to "clean" the DB from articles with "null" values, this will avoid wasting space on items that are no longer in the warehouse, but we keep them until the backup for historical reasons and it's also a best practice when dealing with eventual consistency (even if not needed in this part).
  5. After that we have a Drive account that recieves the backups from one of the previous servlets, I recommend making this programatically at least once a week, and creating a special folder for each year (or month if you have lots of backup files). This files are mostly as a redundant safety feature, and also makes very easy to find the most recent backup in case of some serious problem, simply upload the right backup file from Drive  to the desired DB using the Firebase Web administrator, once the backup finishes uploading, the data will be instantly updated in all the users connected to that DB (but the data made between the backup and the recovery would be lost). I recommend making a backup before a recovery, even when there is a problem, so the data can be analyzed later and something can be  recovered if needed.
  6. Then we have the Cloud Storage JSON API that receives the backups from the other servlet, you can find how to do this in this official How-To. We need the information on Cloud Storage so BigQuery can use it.
  7. Now we have 2 JSONP servlets, the difference between a JSON and a JSONP servlet, is that JSONP has a special padding (also called "Prefix") which is used so the client's code know's which function should handle the response and it's the one we usually need when using GAS to link Google Services and HTML5 web apps. The first servlet for communicating Google Drive and the HML5 App and the second one it's responsible of receiving querys from the client's web app, pass it to BigQuery, wait for the response, pre-process it to best fit your app's needs and deliver it back to the web app when it's done or report (the user and the central error Log) that something went wrong.
  8. Then you will see the IT support HTML5 web app, this one is mostly used for client support,but it should also have direct access to all the servlets, log files, analytics data about the web app's use and the analitycs for the main components (Firebase DBs, specific servlet number of calls, etc),  and automatically show alerts when some servlet reports a problem. The focufs of this App is to monitor the Apps Performance and integrity, facilitate upding the App and IT support chats. A tip, use a GAS ScriptDB central library to count the number of times the servlets are being called, you can also use it as a central cache and for transactions that are same-time overwriting susceptible.
  9. And finally the HTML5 web app for the Administrative staff, here almost everything it's based on historical data, usually parsed into charts for convenience, they care about warehouse, merchandise and personnel performance and it's what one must focus when building this web app. 

Hope you enjoyed reading this article as much as I did writing it and that it inspires you into creating the perfect data-flow for your next project, don't be shy and write your thoughts about this article. I love reading comments, ideas, corrections, suggestions and particularly appreciate critics.

Happy coding!

Here is some of the material I recommend reading for this kind of project: