Dec 11, 2013

Enable HTML5 Offline functionality to your gDrive hosted web apps!

Application Cache it's a pretty neat feature of HTML5, for it allows us to enable Offline functionality, faster loading times and reduced server load by caching a local copy of specific files when the user first access them, it can also be used to route the user to a special Offline version of our HTML code on specific parts of our site/app.

To do this one must create a special Manifest file, it is a very simple text file were we specify the offline cache behavior for our app, the most important parts are where we specify which documents should be cached and the creation date, this last one even though it's not technically required for the manifest to work, it is required to tell your App to update the cached version automatically next time a user logs, without it the cache will never be updated for the users and the only way they could get updates it's by cleaning they're browsers cache. It is also important to notice that most browsers won't allow more than 5Mb of cache for a single site.

We also must add a reference in our HTML code to tell the browser to look for the App Cache manifest, HTML5 makes this very easy, it's a very short declaration in the html tag of our app, for example:

Here we are telling the browser to look for a file called example.appcache in the root folder and use it as an appcache file.

I recommend reading a more detailed description of how to make an App Cache file on this article.

It sounds so easy right! but the trick is that the name it's not really relevant, the .appcache extension it's just done for standardization, technically it could be .awesomeOffline and still work (but please don't ;) standards makes collaborative work easier) what actually matters it's the MIMEtype of that file from the server which must be " text/cache-manifest " and that must be configured from the server, and most HTML hosting services doesn't have this option, and that includes the awesome Google Drive html hosting service, but as most things in this digital world, you can hack a way around this limitation.

To add the manifest file, first we need the ID of the root folder we are using for the gDrive HTML hosting for our application (in this article you can read more about how to host your HTML/JS/CSS code on gDrive), remember that you can find the ID for the folder when you open it from the web version of gDrive it's at the end of the URL in your browser.

Now that we have the folder ID, let's create a new script proyect and let's hit the code!

Let's get a good look at the code, it's very simple!

The most important part it's the MIMEtype, once created you can edit the text file to update it, personally I prefer to delete it and run the script again to create a new one, that way I'm sure that it's got the right date and everything.

AppCache opens a new world of possibilities for HTML5 web apps and web sites! Just be very careful when choosing which parts to cache, usually you should do thisonly with parts that doesn't change very often, like the libraries (and most of the Javascript code) and the CSS. If used properly it will greatly enhance the speed and therefore the user experience, and that must always be our main priority, remember that we are now entering a coding age where UX design really makes the difference.

Feel free to post your request, questions, ideas or simply say hi! on g+ or at the end of this article, your feedback it's very important and greatly appreciated.

Happy coding!

Sep 15, 2013

ACID compliance with Live, Non-SQL, Eventually Consistent DB's

One of the main complains I hear all around about Non-SQL is the ACID compliance and developers making crazy "fixes" to handle eventual data consistency, as I mentioned on this previous post I feel this is mostly because developers are trying to emulate the SQL ways, also they are used to be told how to handle everything, but in the new Non-SQL world, it all boils down to the developers creativity and mostly, its ability to find a syncretism between different approaches and platforms.

Non-SQL and eventual consistency requires a lot of extra creativity, but also gives lots of extra flexibility and opens the path to new and smarter software.

SQL likes to put a lot of limitations and rules for the developer and, in most cases, to the user, specially when it comes to data formats and relations, in Non-SQL the JSON format it's usually the way to go, and JSON doesn't care if your data is a Boolean, string, number, URL... so long it's text and you pack it in a {'key':'value'} format, he's fine with it, also, it allows for different kinds of data to be treated as equal, this makes easy to create a very powerful "Universal Searchbox", programmatic backups and clean-ups are easy and transparent, and you can join all this backups and use them to make awesome reports, by using each backups data as a point in time, and drawing flow charts... Neat huh? but unlike SQL, you won't find much guidelines to tell you how to face this concepts, so you really need to know how to achieve all this in a very precise and creative way... or things can go pretty nasty.

Let's see how we can face ACID compliance first:

Atomicity: The "all or nothing" rule. This is important, but instead of thinking of the whole point-to-point process as a single atomic transaction, here you must "split" it in different transactions, the most critical part is usually the Live Data, I personally like Firebase for this, so the most critical atomicity is between the client's AngularJS (or AngularFire) and the Firebase servers and this is already solved in AngularJS, also there is a lot of people constantly checking and upgrading this. The servlets  are atomic too, and its easy to program them to push a new line with the details of the error in a special log, but even if there is a mayor problem in the servlet or servlets, the Firebase should stay up and running, so the critial parts of the system keeps working as normally as possible. So I recommend always aiming for "isolated atomicity".

Consistency: Only valid data must be written, but again, one must be more specific about what is "valid data", in SQL if you send a number instead of a string, hell unleashes... or even if your string is 1 character too large (or even too short) it just won't work, in the paper this sounds like a way to prevent problems (for the IT staff) but sending extra complexity to the user for the programmers convenience doesn't seem right anymore. By using JSON your consistency focus only that the data is represented correctly ( Eg. [{"key 1":"value 1"},{"key 2":"value 2"}] ). but it doesn't worry about the content's, this makes everything much more flexible.

Isolation: Ideally one will be using a modular approach when working with Non-SQL, usually involving different platforms, so you should not only isolate the transactions, but actually split everything in parts for the eventual data consistency. Usually this can be done by splitting your system in small Live DB's making geographical location the main focus, this usually prevents for 2 people working on the same thing at the same time.

Durability: One very important thing about DB's is backing up your data, ideally you wish to take automatic periodical snapshots of your DB's, (this is very easy with Firebase and Google Drive using a GAS servlet by the way) and once you have periodic snapshots, wouldn't it be nice to put the all togheter, and see how the data changes through time? This can be done with BigQuery, Datastore and a GAS servlet to import the data from Firebase, usually you will add the "snapshot" as a new bucket in Datastore at the same time you make a historical backup on Drive, but you can adjust this according to your needs and budget, to lower prices, you can backup once a week on Drive and once a month on Datastore (for BigQuery).

Now let's think of a crazy example to give you an idea of how you can put together an imaginary software to handle a few warehouses, since this is Non-SQL, this model can be scaled up or down as needed, and new "modules" can be added or the existing ones can be adjusted or replaced by a different approach without problem.

I really recommend an MVC data architecture, so you can separate the UI from the backend and the controllers, this will allow you to focus on one problem at a time, also will simplify future updates, fixes and additions.

In this imaginary example we are going to be using 3 different HTML5 web apps:
  • Warehouse Administration: This should allow each warehouse to handle all it's contents in real-time so all the employees are aware of were everything is and notice any change instantly, if something comes in, out or change it's place in the warehouse this changes are instantly reflected on the view of all the warehouse personal, this way, no matter how fast things change and how many tasks are happening in parallel, everyone's work will keep in sync.
  • Administrative Central Office: This is the software for the staff who take part of the business decisions, they do not need to know the whereabouts of each screw in the warehouse in real-time, but rather the statistical data that comes out of the warehouse operations, mostly product sales performance, personnel efficiency and all sorts of statistical data analysis, so this part of the web app should focus on transforming many (possibly large) historical files into easy to understand charts and reports.
  • IT support: This department should be able to access all the DB instances, the backups and Webapp code and also be able to make changes on any part of the system fast and as transparent to the user as possible, also they should be able to communicate with all the staff to provide assistance and the system should detect, file and report in real-time any problem in the system.
Now let's elaborate more on the characteristics of the system:
  • All the people on the same warehouse, should have instant, real-time access to that specific warehouse data to keep things agile, even when many transactions are happening at the same time.
  • The warehouse staff must be able to quickly find any specif object or group of objects stored in the warehouse, by providing any arbitrary property (name, bar-code, serial, size, position, etc) and the system must show the matching object(s) instantly and with live-data.
  • The people of one warehouse doesn't need to know all the time what's happening at the other warehouses.
  • Every movement inside the warehouse must be logged, things like what was the change, who did it, why and when must be always be properly logged.
There must be the following chats:
  • For each warehouse, the administrative staff and the IT support staff.
  • Shared across all the warehouses.
  • Between the administrative staff and each warehouse, separately.
  • Between the administrative staff and all the warehouse.
  • Between each warehouse.
  • Between anyone and the IT support staff.
There must be the following logs:
  • When something it's moved inside the warehouse.
  • When something enters the warehouse.
  • When something leaves the warehouse.
  • When there is a system error.
  • When someone requests assistance from the IT staff.
  • All the personnel's everyday check-in and check-out.
  • Every activity from each member of the warehouse staff.
Before going into the implementations details, I made this simplified data flow to give you a better picture, note that not all the links are displayed and the solutions proposed here are only an imaginary example, I strongly encourage you to experiment and adapt the general idea to best fit your needs, budget and style.

Let's browse this from left to right, okay? here we go:

  1. First we have the HTML5 app used by the warehouse staff, this is were the real action happens, here the employees from the warehouses receives they're orders, find things and make reports about every move in the merchandise.
  2. To handle this we are using 3 Firebase DBs, one for each warehouse, this isolates the data and allows for precise tracking on each warehouse activity (Firebase includes an Analytics service). To connect Firebase with HTML5 I suggest AngularJS (AngularFire). Note: A single Firebase DB could handle all you need, but we split it for isolation, to balance the Quota and keep the costs minimum. Also there is a 10mb limit for exporting/importing data over most platforms, so splitting and periodically backing up and cleaning the Firebase DB's (always in that order) it's important. If 10mb it's too small for you, you will need to "zonify" your DB's, split the Firebase DB and back them up separately, Eg. [{"Zone 1":{},"Zone 2":{}}]. Since this is done by GAS servlets and not by the HTML5 app or Firebase itself, you can make changes to this as much as needed, usually without "downtime" and we always have the option to turn the servlet back to a previous version.
  3. We have an additional Firebase DB to handle the chats, you can handle as many chats as you need using a single Firebase DB, it is not on the diagram, but you may wish to add an extra GAS servlet to log and erase (from Firebase) the chats, usually at the end of the day.
  4. Then we have 2 GAS JSON servlets that uses GAS URL fetch to get backups from Firebase and pass the data to Drive and Datastore using Firebase REST API, after backing up it's time to call another servlet to "clean" the DB from articles with "null" values, this will avoid wasting space on items that are no longer in the warehouse, but we keep them until the backup for historical reasons and it's also a best practice when dealing with eventual consistency (even if not needed in this part).
  5. After that we have a Drive account that recieves the backups from one of the previous servlets, I recommend making this programatically at least once a week, and creating a special folder for each year (or month if you have lots of backup files). This files are mostly as a redundant safety feature, and also makes very easy to find the most recent backup in case of some serious problem, simply upload the right backup file from Drive  to the desired DB using the Firebase Web administrator, once the backup finishes uploading, the data will be instantly updated in all the users connected to that DB (but the data made between the backup and the recovery would be lost). I recommend making a backup before a recovery, even when there is a problem, so the data can be analyzed later and something can be  recovered if needed.
  6. Then we have the Cloud Storage JSON API that receives the backups from the other servlet, you can find how to do this in this official How-To. We need the information on Cloud Storage so BigQuery can use it.
  7. Now we have 2 JSONP servlets, the difference between a JSON and a JSONP servlet, is that JSONP has a special padding (also called "Prefix") which is used so the client's code know's which function should handle the response and it's the one we usually need when using GAS to link Google Services and HTML5 web apps. The first servlet for communicating Google Drive and the HML5 App and the second one it's responsible of receiving querys from the client's web app, pass it to BigQuery, wait for the response, pre-process it to best fit your app's needs and deliver it back to the web app when it's done or report (the user and the central error Log) that something went wrong.
  8. Then you will see the IT support HTML5 web app, this one is mostly used for client support,but it should also have direct access to all the servlets, log files, analytics data about the web app's use and the analitycs for the main components (Firebase DBs, specific servlet number of calls, etc),  and automatically show alerts when some servlet reports a problem. The focufs of this App is to monitor the Apps Performance and integrity, facilitate upding the App and IT support chats. A tip, use a GAS ScriptDB central library to count the number of times the servlets are being called, you can also use it as a central cache and for transactions that are same-time overwriting susceptible.
  9. And finally the HTML5 web app for the Administrative staff, here almost everything it's based on historical data, usually parsed into charts for convenience, they care about warehouse, merchandise and personnel performance and it's what one must focus when building this web app. 

Hope you enjoyed reading this article as much as I did writing it and that it inspires you into creating the perfect data-flow for your next project, don't be shy and write your thoughts about this article. I love reading comments, ideas, corrections, suggestions and particularly appreciate critics.

Happy coding!

Here is some of the material I recommend reading for this kind of project:

Aug 31, 2013

Insights and tips for working with Non-SQL Databases

The problem with Non-SQL DBs is that most developers doesn't really know how to use them on business grade solutions and they tend to keep all the "bad habits" and approaches of SQL... A big problem they face is when they want to make relations between data sets and they need a "direct dependence" like in SQL. I see the relational approach as one of the biggest SQL flaws and not the other way around, why?

Well, fisrt it makes the development to focus more on the data model due to it's complexity than the user experience and the UI needs, this is why systems tend to be complicated to use and providing very little flexibility for the user. Also, relational data doesn't always handle well with parallel data manipulation (due to the relations between tables). Software architecture must follow hardware architecture for best performance, and hardware its focusing on parallel processing, so does modern software, and this requires a parallel data-model.

Another problem with the relational approach is that since lots of things depends on others , it's very easy that something goes wrong and this usually ends with an error message or at least not being able to perform an operation.

Developers really need to learn how to use Non-SQL systems, JSON it's a great system for data administration, if you know how use it properly it can do things that developers and users alike only think as "utopias".

A simplified explanation on this "new" concepts of data-model:

Opposite to the horrible "normalized" data in SQL, here every object holds all the properties and sub-properties related to it (de-normalization/high data replication), the system only groups objects who share some specific property or properties when performing a query. This gives the system great flexibility and stability, even if there is data corruption, only a fraction of the data gets damaged and it's usually easy to "clean" and get back to normal, if this ever happens.

When you talk about de-normalizing data, everyone thinks the table will become HUGE, expensive and slow, but JSON it's actually very space-efficient, it's just text, comas and parentheses! and data gets formatted in a very efficient way compared to SQL, you will be amazed about how much data you can hold over 100mb. Also, smart systems allow data to grow without impacting performance significantly. Another thing that impacts the size of JSON is that it does not waste space stating dependencies and kinds. Also it is highly compressible (due to the high data replication) and therefore works great with data compression (Apps Script has special zip methods for this and most servers support or can be set to support it), use this when you need to benefit bandwidth over processing time.

Objects do not need to declare the kind or size of the data it holds, JSON doesn't care or need this information, again, this provides outstanding flexibility.

Let's use a Warehouse as an example of data stored in the DB, SQL  couldn't be worse for this, still, she's the queen of this realms, shame it's such an old queen not willing to catch up with the times, why? well, this lady forces everyone to her rules and expectations, and refuses to accept things she doesn't know and understand well in advance, also, she expects uniformity and it's a big fan of bureaucracy, her kingdom has been long and now everyone it's used to its ways, a different reality seems too utopic to really work and most of those who has tried to taste the Non-SQL kingdom, after a life of following the Queen's demands, they start feeling lost and keeps trying to emulate the Old-ways in the new realms, of coarse things go bad.

The real-life Warehouse holds objects of all kinds, very different from each other and they change over time, also new things comes with new characteristics and maybe it no longer uses some of the old ones. Even though they can be very different, they are all together and if peace it's to come to the warehouse, everyone must be treated as equal, so the little screw that is sold by weight (instead of pieces) and the refrigerator with a serial number can be stored next to each other and open the path to smart and universal search, were you type any property or name and simply get the matching result or results, and then decide what to do with what you found, also one expects to be able to do anything (reasonable). Note this also makes the universal search, the universal windows (to some extent).

Every object knows all its details, you know, like in real life... so it doesn't need to have it's information spread across different tables, habitats of the SQL kingdom are always trying to keep a different table for everything, group everyone according to some arbitrary attribute or relationship, but in the Non-SQL world, this is evil...possible, but evil, and when certain information changes, and you need to change other object properties in response , fire warnings or simply keep a record, most developers has problems when working over Non-SQL DB's, they want especial tables and fixed relations to handle this, but here you don't bend reality to fit your model, but the other way around... So, if the amount of screws changes or if the fridge got sold, there are separate "robots" (usually JSONP servlets) that are notified about this, AFTER the information on the table changes, this way, the user doesn't need to wait extra time for all the bureaucracy to be done, this robots updates a separate object values (different table, same table or even both), writes a record on a list (a special drive file for example), notifies an external server or do any kind of action, Google Apps Script its great for this, also this GAS robots bridge your App, the data model, the client and the DBs when needed.

Also, this robots are specific and works in parallel, each one knows who is authorized to use it and who doesn't, can keep track of who, for what and when it was called, without affecting it's performance. If one fails, it's easy to spot and return to a previous version while a fix is done, in the mean time, everything will usually keep running and the system will just be missing that part of the processing. Smart programming can handle to update everything else back to normal once the fixing it's done, all this with no or just partial downtime. Once the fixing it's done, you simply update the public version of the GAS robot, and that's it. This kind of things just doesn't happend in the SQL's kingdom so easily.

Also, this way it makes more sense to plan your eventual data consistency, and for this the key its to prioritize, to know were the data must be up-to-this-second and were it can be a little bit outdated. Don't get me wrong, all data can be live, it's possible to have 10,000 users working live over 10 Tb of data (have you seen Firebase yet?), but it will be expensive as hell... in this realm it's not about if it's possible, there are many ways to do a lot of things, the tricky part is actually to balance everything, in this case, live-data VS cost. My best tip, keep the reports and notifications in a non-live DB.

One way to keep your live-JSON-data clean is to make programmatic backups, and after the back-up, clean the live-data table from all objects that represents empty things (tip: this is were you must erase something when its amount reaches 0, or hits a null value, not before, for historical report convinience and to avoid errors in eventual data consistency handling).

This backups are best done by having a robot (GAS servlet) query all the JSON in the DB, stringify it and save it in a special Drive file, with the date as the name, in a specific Google Drive folder; another way (best for small/medium applications) its to dump the data inside a Google Docs Spreadsheet (exclusively or additionally).The appearance of this new file must trigger another robot that will search for the objects that needs to be cleaned, and erase them from the Live-Data. No-downtime or risk of loosing data that happened during the backup/clean process.

Tips for professionals: If you require fast-direct communication between your robots, with the possibility to "hold" specific data writing until someone has fished. Apps Script ScriptDB it's your solution. If you require an extra punch of GAS robot performance, sometimes Apps Script Cache Services hold the key. To import data, prefer REST and use Apps Script URL Fetch Service. To export data and communicate with traditional servers, use Apps Script JDBC Service and/or Apps Script XML Service. To export data to your HTML5 web app use Apps Script HTML Services or Apps Script Content Services. If all you care is price and scalability, go for Google Cloud Datastore JSON API, for live (but much more expensive) data go for Firebase or Google Drive Realtime API (here you can find a great github example of how to use the RT API in AngularJS, created by the awesome programmer +Steven Bazyl , and to handle the async data loading, AngularJS works great. Mix and balance things for best results.

But when this method really shines is when it's time to make historical analysis of data, so long it's done using the right tools of coarse, the very best way to do it is through BigQuery, it's capable of importing and processing terabytes of data in seconds and GAS can then pass the results to Google Charts to make amazing displays of data, you can easily compare how specific things behave over time, and it can be years of countless data. If you are using Spreadsheets to store your data, you can save a lot of trouble (and money) and use Fusion Tables instead to merge and analyze your data.

Well, first and most, very few people knows how to properly use this technologies, it's not what you find in common books, learn at school, or simply be able to ask your old teachers about, it's new stuff! Also, it's implementations, being based on open standards, works with so many things and in so many ways, that it depends a lot on the developing team skills and imagination to be able to balance cost, performance, compatibility and flexibility, most developers are used to be told how to do things, to follow standard procedures and follow guide-lines, this all applies here, but material is yet to be written, but this opens the path to those ready to push the limits and show what they can do, now more than ever developers can show they're skills and stand above even the rulers in the SQL kingdom.

The trend right now is to build convenient systems, that are accessible from any platform and users expect that the system adapts to their needs, now everyone asks that the software adapts to them and  to help them do things, hates needing to learn and adapt to the systems need and being forced to help it do what it was made for...

For all this (and so much more), a flexible web-based model like HTML5, with a flexible UI like CSS3, a cross-plattform code like Javascript, a CORS data-model like JSON and a Non-SQL back-end makes sense for the future of applications, and remember that your work-flow must be at least as flexible and responsive as your data and UI.

Thanks for reading and happy coding!