Wednesday, July 22, 2015

Lets understand Apache Thrift

Apache Thrift is framework, for implementing RPC in services, with cross-language support. 
RPC (Remote Procedure Call) is very similar to normal function, only that it is present remotely on a different server as a service. A service exposes many such functions/procedure to its client. And client requires some way to know what are the functions/procedures exposed by this service and what are their parameters.

This is where Apache Thrift comes in. It has its own "Interface Definition Language" (IDL). In this language you define what are the functions and what are their parameters. And then use Thrift compiler to generate corresponding code for any language of your choice. What this means, is that you can implement a function in java, host it on a server and then remotely call it from python or any other language for that matter. 


In this post we will try to understand what thrift does, its architecture and components:

Thrift networking stack can be represented as:



Lets understand each component.

Transport

The Transport layer provides a simple abstraction for reading/writing from/to the network. This enables Thrift to decouple the underlying transport from the rest of the system (serialization/deserialization, for instance).

Here are some of the methods exposed by the Transport interface:
  • open
  • close
  • read
  • write
  • flush
Here are some of the transports available for majority of the Thrift-supported languages:
  • file: read/write to/from a file on disk
  • http: as the name suggests
Eg - We create a thrift HTTP transport in java as following
org.apache.thrift.transport.THttpClient transport = new THttpClient("http://" + serverIP + ":" + serverPortserviceUrl);

Protocol

The Protocol abstraction defines a mechanism to map in-memory data structures to a wire-format. In other words, a protocol specifies how datatypes use the underlying Transport to encode/decode themselves. Thus the protocol implementation governs the encoding scheme and is responsible for (de)serialization. Some examples of protocols in this sense include JSON, XML, plain text, compact binary etc.

Eg- We create a thrift Binary Protocol in java as following
org.apache.thrift.protocol.TProtocol protocol = new TBinaryProtocol(transport);

Processor

A Processor encapsulates the ability to read data from input streams and write to output streams. The input and output streams are represented by Protocol objects. Service-specific processor implementations are generated by the compiler. The Processor essentially reads data from the wire (using the input protocol), delegates processing to the handler (implemented by the user) and writes the response over the wire (using the output protocol).

Incase you have to create a thrift processor (Eg - Thrift with servlet), we create as following in java
Processor<Iface> processor = new Processor<Iface>(<ThriftInterfaceImpl.Iface>);

Server

A Server pulls together all of the various features described above. 


  • Create a transport
  • Create input/output protocols for the transport
  • Create a processor based on the input/output protocols
  • Wait for incoming connections and hand them off to the processor
That concludes this session, do let me know if you have any comments. And Happy coding :)

2 comments:

  1. Hi
    How server notifies a change to the client in Thrift?

    ReplyDelete
    Replies
    1. Even I have the same doubt, this was asked in one of the comments in the page from where I crawled over here.

      "There are four cases in which version mismatches may occur.
      1. Added field, old client, new server. In this case, the old client does not send the new field. The new server recognizes that the field is not set, and implements default behavior for out-of-date requests.
      2. Removed field, old client, new server. In this case, the old client sends the removed field. The new server simply ignores it.
      3. Added field, new client, old server. The new client sends a field that the old server does not recognize. The old server simply ignores it and processes as normal.
      4. Removed field, new client, old server. This is the most dangerous case, as the old server is unlikely to have suitable default behavior implemented for the missing field. It is recommended that in this situation the new server be rolled out prior to the new clients.

      I do not see something new here? REST has the option to make a field as optional and boils down to the same solutions and final concern for case 4 apply as well...

      Also I cannot quite follow your reasoning on the remark that thrift is supported by many languages. REST, being an HTTP based protocol is supported by any language that has HTTP Client/Server libraries and some JSON processing."

      I'm still thinking over this statement and as you seem to have some deeper insights over thrift could you help over this statment ?

      Delete