Wednesday, October 28, 2020

CORS Theory

Wikipedia explains CORS as:

"The CORS standard describes new HTTP headers which provide browsers with a way to request remote URLs only when they have permission."

along with the visual explanation of "Path of an XMLHttpRequest(XHR) through CORS." as below:

 

--------------------------------------------------------------------------------------------------------------------------------------------------

Browsers use preflight OPTIONS request for CORS verification. A preflight request is issued by the Browser itself when:

  1. Conditions in above diagram are met or
  2. Request has a Stream Data
  3. Request is from different Origin

OPTIONS request is browser's way of asking/telling server that the next subsequent request will have file stream data or is going to be from a different origin, etc. and asking Server to respond if this kind of request is supported by the server or not. If server supports it then it responds with a HTTP 200 OK with the Access-Control-* headers to indicate WHAT exactly is allowed by the server.

Courtesy (for above sequence diagram):  https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Some of the CORS headers and their sample values are as follows:

const corsOrigin = 'https://some.other.origin:4444/relative/path/goes/here';
res.header('Access-Control-Allow-Origin', corsOrigin);
res.header('Access-Control-Allow-Methods', 'POST,OPTIONS');
res.header('Access-Control-Allow-Headers',
'Authorization,Content-Type,Accept,Referer,Origin,X-Requested-With');
res.header('Access-Control-Allow-Credentials', true);

What constitutes a cross-domain:

  1. Different protocol
  2. Different domain
  3. Different port


Friday, October 16, 2020

GraphQL Performance Tips

  • Express with ‘graphql-jit’ improves performance multi-fold. This library compiles the queries and keeps them ready. If we cache the compiled ones then it is even more fast.
  • Tracing at the Resolver-level kills the performance.
  • Type-graphql adds overhead.
  • Apollo-server adds overhead.
  • Koa Server is faster option than Express because Express validates schema for every single request even when it is not changed.

GraphQL Design Tips

  • Design for behaviors or use-cases over data or Think Domain over data
  • Be careful with Atomicity vs Granularity
  • Stay away from trying to build a “One size fits all” API
  • Sometimes we all make mistakes :)… use @Deprecated directives judiciously and provide valid “reason” & “alternative way” while doing so. Understand first: Who is using? How much is used?
  • Talk to, (or become) domain experts
  • GraphQL Spec doesn’t have any opinion about Auth. That leaves us with 3 options:
    1. Wrap Resolvers
    2. Write Custom Directives or 
    3. Delegate to Abstraction Layers (AppSync / Apollo) => github.com/chenkie/graphql-auth
  • Fetch data optimally. Just as it is not good to fetch everything at once; it's easy to get burned by over-fetching with "top-heavy" parent-to-child resolvers. Fetching and passing data from parent-to-child should be used sparingly.
  • Use libraries like "https://github.com/facebook/dataloader" to de-dupe downstream requests.
  • Write resolvers that are readable, maintainable, testable. Not too clever.
  • Make your resolvers as thin as possible. Extract out data fetching logic to re-usable async functions.
  • Avoid use of GraphQL for purely Fileupload use-cases as it is a big deal for GraphQL in general.

                  GraphQL Upload: Pitfalls & Tricks

                  1. Fileupload is a big deal for GraphQL in general. As of today `apollo-server-lambda` does not support fileupload yet. The work-around is to use some server or middleware process to achieve it.

                  2. GraphQL documentation for Upload is limited

                  3. Express/Apollo or any other server with AWS Lambda is needed for GraphQL FileStreams handling

                  4. AWS API Gateway needed to be configured with Binary Media Types. References:

                  • https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html
                  • https://stackoverflow.com/questions/41756190/api-gateway-post-multipart-form-data

                  5. Binary input is expected by all GraphQL mutations when binary media types is turned on

                  6. CORS can be more challenging to handle at the API Gateway level due to the single GraphQL end point

                  7. Limited information/support on streaming libraries viz. Busboy, Form-Data, Formidable, Multer, Multiparty. Finally decided to use FormData by properly passing mimetype,filename,encoding. Reference:

                  • https://npmcompare.com/compare/busboy,form-data,formidable,multer,multiparty

                  8. Stream hand-over had challenges. There are libraries (mentioned in above point) that help with parsing the file and text from the incoming request and handing it over to another API. References:

                  • https://stackoverflow.com/questions/52963648/how-to-pass-multipart-request-from-one-server-to-another-in-nodejs
                  • https://github.com/apollographql/apollo-server/issues/1854

                  9. Async/Await for FileStreams seems to be misbehaving in with AWS Lambda environment causing File size becomes 0 while API responding with 200. Reference:

                  • https://levelup.gitconnected.com/avoiding-the-pitfalls-of-async-node-js-functions-in-aws-lambda-941220582e7a

                  GraphQL vs REST?

                  - GraphQL REST
                  Architecture Client-driven Server-driven
                  Organized in terms of Schema & type system Endpoints
                  Operations Query
                  Mutation
                  Subscription
                  Create
                  Read
                  Update
                  Write
                  Data fetching Specific data with single API call Fixed data with multiple API calls
                  Community Growing Large
                  Performance Fast Multiple network calls take up more time
                  Development Speed Rapid Slower
                  Learning Curve Difficult Moderate
                  Self-documenting Using introspection. -
                  File uploading Very challenging -
                  Web caching via libraries built on top -
                  Stability Lesser error prone: automatic validation and type checking Better choice for complex queries
                  Use cases Multiple micro-services
                  Aggregators
                  Mobile Apps
                  Simple apps
                  Resource driven apps

                  GraphQL Introduction

                  After being publicly released by Facebook in 2015 GraphQL Specification has seen a staggering adoption in the software industry. Below are some facts about it:

                  • GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was introduced internally at Facebook in 2012 to address the issue with RESTful APIs around over-fetching/under-fetching of data for their mobile application.
                  • GraphQL Spec has many implementations which open-source-community created. This was after first RecatJS implementation was created by Facebook. Implementations are present for languages such as: Javascript/Typescript, Go, Ruby, Scala, Elixir, Python, Java. Few popular implementations are Apollo, AWS AppSync, Prisma, Hasura, etc. With this increasing adoption, GraphQL Tooling & Ecosystem has also seen a huge surge.
                  • GraphQL has 3 request types: queries (for getting data from the API), mutations (for changing data via the API), and subscriptions (long-lived connections for streaming data from the API).
                  • GraphQL is Typed. Contrary to REST; the GraphQL does runtime datatype checking & sub-selection to limit fields returned.
                  • GraphQL schemas are composed of Resolvers. That means data can be pulled from anywhere/any source-system independently. A resolver is a function that resolves a value for a type or field in a schema. Resolvers can be asynchronous too! A resolver function takes four arguments (in that order):
                    • 1. parent: The result of the previous resolver call (more info).
                    • 2. args: The arguments of the resolver’s field.
                    • 3. context: A custom object each resolver can read from/write to.
                    • 4. info: Contains the query AST and more execution information. In other words it contains, Meta-data about the request.
                  • GraphQL can return multiple resources in one round trip to server.
                  • GraphQL ensures no over-fetching as well as no under-fetching. API Clients are in full control about what they want to query. If some client wants a single field to query or make an update then it can do so without need of additional API Endpoints or a new version of the same API.
                  • GraphQL is Introspectable. GraphiQL Tool exploits exactly that to provide client generation, query generation, suggestions & documentation. Think of it as Java Reflection.
                  • One of the reasons customers choose GraphQL is the power of Subscriptions. These are notifications that are sent immediately to clients when data has changed by a mutation. AppSync subscriptions are implemented using Websockets, and are directly tied to a mutation in the schema. The AppSync SDKs and AWS Amplify Library allow clients to subscribe to these real-time notifications. AppSync Subscriptions have many uses outside standard API CRUD operations. They can be used for inter-client communication, such as a mobile or web chat application. Subscription notifications can also be used to provide asynchronous responses to long-running requests. The initial request returns quickly, while the full result can be sent via subscription when it’s complete (local resolvers are useful for this pattern).
                  • While there's nothing that prevents a GraphQL service from being versioned just like any other REST API, GraphQL takes a strong opinion on avoiding versioning by providing the tools for the continuous evolution of a GraphQL schema.

                  Thursday, August 15, 2019

                  Multi-tenancy Summarized

                  Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers (tenants).

                  Consideration Points:

                  1. Quantity of tenants
                  2. Performance
                  3. Time of development
                  4. Reliability
                  5. Disaster recovery
                  6. Execution Environment Isolation: Adding, modifying or removing features for one customer should not impact other customers
                  7. Features added for one customer may be applicable for other customers. Hence customization cannot be tied to a single customer. The architecture should allow same features to be enabled and disabled easily for other customers.
                  8. Licenses can be upgraded or downgraded. Features should be grouped so that they can be easily added or removed from a customer execution environment.

                  Advantages:

                  1. Deploy code without affecting other tenants in the environment
                  2. Feature Customization: Enable features for a single tenant without disturbing other tenants
                  3. Cost-efficient and Financial Benefits
                  4. Efficient Resource Usage
                  5. Easier Set Up and On-boarding
                  6. Easier Upgrades and Maintenance

                  Disadvantages:

                  1. Data Leakage and Reliability
                  2. More control over backups and recovery
                  3. Migration Ease of Use
                  4. Full control over the environment
                  5. More vulnerable from a security standpoint
                  6. Single point of failure for all Tenants

                  Approaches:
                  1. Separate databases: Each tenant has its own database. Eg. Maintain different databases for different tenants and connect to the correct database at runtime.
                  2. Separate schemas: Tenants share common database but each tenant has its own set of tables (schema). Eg: Maintain a single database, but different schemas for different tenants qualified by the Company Id
                  3. Shared schema aka Partitioned (discriminator) Data: Tenants share common schema and are distinguished by a tenant discriminator column. Eg: Add a Company Id to all the tables and qualify all queries with a company id.
                  4. Data Sharding
                  5. Hypervisor-level Isolation

                  Given the advantages and disadvantages of each approach, I'd recommend using separate schemas, as they provide the best balance between difficulty of development and cost of running the database layer.