August 31, 2013

Insights and tips for working with Non-SQL Databases

The problem with Non-SQL DBs is that most developers doesn't really know how to use them on business grade solutions and they tend to keep all the "bad habits" and approaches of SQL... A big problem they face is when they want to make relations between data sets and they need a "direct dependence" like in SQL. I see the relational approach as one of the biggest SQL flaws and not the other way around, why?

Well, fisrt it makes the development to focus more on the data model due to it's complexity than the user experience and the UI needs, this is why systems tend to be complicated to use and providing very little flexibility for the user. Also, relational data doesn't always handle well with parallel data manipulation (due to the relations between tables). Software architecture must follow hardware architecture for best performance, and hardware its focusing on parallel processing, so does modern software, and this requires a parallel data-model.

Another problem with the relational approach is that since lots of things depends on others , it's very easy that something goes wrong and this usually ends with an error message or at least not being able to perform an operation.

Developers really need to learn how to use Non-SQL systems, JSON it's a great system for data administration, if you know how use it properly it can do things that developers and users alike only think as "utopias".

A simplified explanation on this "new" concepts of data-model:

Opposite to the horrible "normalized" data in SQL, here every object holds all the properties and sub-properties related to it (de-normalization/high data replication), the system only groups objects who share some specific property or properties when performing a query. This gives the system great flexibility and stability, even if there is data corruption, only a fraction of the data gets damaged and it's usually easy to "clean" and get back to normal, if this ever happens.

When you talk about de-normalizing data, everyone thinks the table will become HUGE, expensive and slow, but JSON it's actually very space-efficient, it's just text, comas and parentheses! and data gets formatted in a very efficient way compared to SQL, you will be amazed about how much data you can hold over 100mb. Also, smart systems allow data to grow without impacting performance significantly. Another thing that impacts the size of JSON is that it does not waste space stating dependencies and kinds. Also it is highly compressible (due to the high data replication) and therefore works great with data compression (Apps Script has special zip methods for this and most servers support or can be set to support it), use this when you need to benefit bandwidth over processing time.

Objects do not need to declare the kind or size of the data it holds, JSON doesn't care or need this information, again, this provides outstanding flexibility.

Let's use a Warehouse as an example of data stored in the DB, SQL  couldn't be worse for this, still, she's the queen of this realms, shame it's such an old queen not willing to catch up with the times, why? well, this lady forces everyone to her rules and expectations, and refuses to accept things she doesn't know and understand well in advance, also, she expects uniformity and it's a big fan of bureaucracy, her kingdom has been long and now everyone it's used to its ways, a different reality seems too utopic to really work and most of those who has tried to taste the Non-SQL kingdom, after a life of following the Queen's demands, they start feeling lost and keeps trying to emulate the Old-ways in the new realms, of coarse things go bad.

The real-life Warehouse holds objects of all kinds, very different from each other and they change over time, also new things comes with new characteristics and maybe it no longer uses some of the old ones. Even though they can be very different, they are all together and if peace it's to come to the warehouse, everyone must be treated as equal, so the little screw that is sold by weight (instead of pieces) and the refrigerator with a serial number can be stored next to each other and open the path to smart and universal search, were you type any property or name and simply get the matching result or results, and then decide what to do with what you found, also one expects to be able to do anything (reasonable). Note this also makes the universal search, the universal windows (to some extent).

Every object knows all its details, you know, like in real life... so it doesn't need to have it's information spread across different tables, habitats of the SQL kingdom are always trying to keep a different table for everything, group everyone according to some arbitrary attribute or relationship, but in the Non-SQL world, this is evil...possible, but evil, and when certain information changes, and you need to change other object properties in response , fire warnings or simply keep a record, most developers has problems when working over Non-SQL DB's, they want especial tables and fixed relations to handle this, but here you don't bend reality to fit your model, but the other way around... So, if the amount of screws changes or if the fridge got sold, there are separate "robots" (usually JSONP servlets) that are notified about this, AFTER the information on the table changes, this way, the user doesn't need to wait extra time for all the bureaucracy to be done, this robots updates a separate object values (different table, same table or even both), writes a record on a list (a special drive file for example), notifies an external server or do any kind of action, Google Apps Script its great for this, also this GAS robots bridge your App, the data model, the client and the DBs when needed.

Also, this robots are specific and works in parallel, each one knows who is authorized to use it and who doesn't, can keep track of who, for what and when it was called, without affecting it's performance. If one fails, it's easy to spot and return to a previous version while a fix is done, in the mean time, everything will usually keep running and the system will just be missing that part of the processing. Smart programming can handle to update everything else back to normal once the fixing it's done, all this with no or just partial downtime. Once the fixing it's done, you simply update the public version of the GAS robot, and that's it. This kind of things just doesn't happend in the SQL's kingdom so easily.

Also, this way it makes more sense to plan your eventual data consistency, and for this the key its to prioritize, to know were the data must be up-to-this-second and were it can be a little bit outdated. Don't get me wrong, all data can be live, it's possible to have 10,000 users working live over 10 Tb of data (have you seen Firebase yet?), but it will be expensive as hell... in this realm it's not about if it's possible, there are many ways to do a lot of things, the tricky part is actually to balance everything, in this case, live-data VS cost. My best tip, keep the reports and notifications in a non-live DB.

One way to keep your live-JSON-data clean is to make programmatic backups, and after the back-up, clean the live-data table from all objects that represents empty things (tip: this is were you must erase something when its amount reaches 0, or hits a null value, not before, for historical report convinience and to avoid errors in eventual data consistency handling).

This backups are best done by having a robot (GAS servlet) query all the JSON in the DB, stringify it and save it in a special Drive file, with the date as the name, in a specific Google Drive folder; another way (best for small/medium applications) its to dump the data inside a Google Docs Spreadsheet (exclusively or additionally).The appearance of this new file must trigger another robot that will search for the objects that needs to be cleaned, and erase them from the Live-Data. No-downtime or risk of loosing data that happened during the backup/clean process.

Tips for professionals: If you require fast-direct communication between your robots, with the possibility to "hold" specific data writing until someone has fished. Apps Script ScriptDB it's your solution. If you require an extra punch of GAS robot performance, sometimes Apps Script Cache Services hold the key. To import data, prefer REST and use Apps Script URL Fetch Service. To export data and communicate with traditional servers, use Apps Script JDBC Service and/or Apps Script XML Service. To export data to your HTML5 web app use Apps Script HTML Services or Apps Script Content Services. If all you care is price and scalability, go for Google Cloud Datastore JSON API, for live (but much more expensive) data go for Firebase or Google Drive Realtime API (here you can find a great github example of how to use the RT API in AngularJS, created by the awesome programmer +Steven Bazyl , and to handle the async data loading, AngularJS works great. Mix and balance things for best results.

But when this method really shines is when it's time to make historical analysis of data, so long it's done using the right tools of coarse, the very best way to do it is through BigQuery, it's capable of importing and processing terabytes of data in seconds and GAS can then pass the results to Google Charts to make amazing displays of data, you can easily compare how specific things behave over time, and it can be years of countless data. If you are using Spreadsheets to store your data, you can save a lot of trouble (and money) and use Fusion Tables instead to merge and analyze your data.

Well, first and most, very few people knows how to properly use this technologies, it's not what you find in common books, learn at school, or simply be able to ask your old teachers about, it's new stuff! Also, it's implementations, being based on open standards, works with so many things and in so many ways, that it depends a lot on the developing team skills and imagination to be able to balance cost, performance, compatibility and flexibility, most developers are used to be told how to do things, to follow standard procedures and follow guide-lines, this all applies here, but material is yet to be written, but this opens the path to those ready to push the limits and show what they can do, now more than ever developers can show they're skills and stand above even the rulers in the SQL kingdom.

The trend right now is to build convenient systems, that are accessible from any platform and users expect that the system adapts to their needs, now everyone asks that the software adapts to them and  to help them do things, hates needing to learn and adapt to the systems need and being forced to help it do what it was made for...

For all this (and so much more), a flexible web-based model like HTML5, with a flexible UI like CSS3, a cross-plattform code like Javascript, a CORS data-model like JSON and a Non-SQL back-end makes sense for the future of applications, and remember that your work-flow must be at least as flexible and responsive as your data and UI.

Thanks for reading and happy coding!