Research
The Ohua system is a research vehicle and although the core concepts have been defined there is enough room left for further innovation.
- Automatic graph rewrites for MPP scenarios. An important processing scenario for this type of system is parallel processing. It allows increasing speed and helps the systems to scale even for very large volumes of data. The Ohua system is primarily designed to support MPP (Massively Parallel Processing). The classic way to turn an existing flow graph into an MPP flow leverages the concept of data parallelism and involves executing certain parts of the graph in parallel on different parts of the incoming data. As of now Ohua does not support automatic graph rewrites that analyze the flow with respect to its structure and the participating operators in order to derive an optimal MPP flow.
- Thread granularity. There is a trade-off in terms of dynamically deriving an MPP flow from a normal flow graph as constructed by the user. Too many threads will increase the operating system scheduling and impacts the performance. Too few threads might not utilize all the CPUs 100% all the time which is desired for high performance. The goal is to find the minimal number of threads that will keep a certain amount of CPUs busy all the time. The Ohua engine already provides an extension point and the capabilities to run multiple consecutive operators in one thread.
- Automatic failure detection/high availability. Especially in the context of (near) real-time online applications such as for instance online trading new requirements are placed on system availability. A downtime for more than half a minute can turn into a loss of a quite reasonable amount of money. Hence automatic failure detection and especially high availability mechanism will have to be developed to allow Ohua to become a player in this field.
Development
The Ohua system has been subject to research throughout the last two years. It is time now to proof our concepts in real-life scenarios. In order to make it usable for a wider community the following tasks will help to create Ohua flows easier and even enable non-Java programmers to use the system.
- XML based flow descriptions.
Flow graphs are currently constructed programmatically requiring the flow author to not only be familiar with Java but also have a development environment such as Eclipse set up. In order to enhance the usability of Ohua and allow even non-programmers to specify flow graphs the current work focuses on building an XML Parser to read processes specified in the more intuitive XML format.
- Generic relational type system.
The current implementation is working only with a set of static packets. This greatly limits the application context of Ohua. One of the next major tasks will therefore be to enhance the engine with a generic type system that will allow specifying the data format transferred among the arcs to be specified in design-time of the flow graph.
- Visual support for flow construction.
DOT/DOTTY is graph visualization toolkit from AT&T Bell Labs (http://hoagland.org/Dot.html). In its current version Ohua has a very simple converter between an Ohua process and the DOT language in order to visualize flow graphs. With a little bit of more work this can be used to visualize the flow under construction as flows can turn very complex and hard to read even in an XML format. Periodically invoking the DOT converter will allow the process author to visualize the current state of the graph. This will become even more important once the generic data model is in place an adaptations need to be specified for the input schema of each operator.
- Ohua engine extension points.
The core processing engine of Ohua has been designed based on the notion of an abstract packet as a unit of data to be transferred among the arcs of the flow graph. During data processing the engine provides various kinds of events such as for instance packet arrivals at each operator. The result is an important extension point of the Ohua system that allows to introduce new packets into the system and handlers for these packets building a new layer that can enhance the system with a new algorithm. As I chose Java to the implementation language for Ohua and Java does not have a real mechanism for multiple inheritance I was facing the well-known extension problem from compiler design and the usage of the visitor pattern. The extension to a visitor pattern will force the developer to also implement the visit functions among the packets that already exist in the system. These packets are normally not accessible as the Ohua system might have been delivered as a jar file without source code available. Yet it still should allow for that type of extension. Recently Mike made me aware of the Scala language (http://www.scala-lang.org/node/25) that seamlessly integrates with Java and provides mixin-based composition.