Sunday, August 18, 2013

Applications as canvas, rethinking how we design applications on Linux

In this article, I will try to imagine a new way to build applications on Linux. Starting from questions like: How applications can be easier to tune, to refactor and support the new platforms or input devices? Does Linux applications have problems that should be spotted? Does using a Software Bus improve or make a radical change of the way we currently develop?

The philosophy of the UNIX system

Ken Thomson has taken a lot of decisions while designing UNIX, he wanted to create a powerful system composed by writing small software pieces that "do one thing and do it well" and which can be connected together using pipes in order to accomplish a more complex tasks. 
Building short, simple, clear, modular, and extendable code has allowed an easy maintaining and understanding of how the full system work.
Linux has inherited such powerful philosophy, and this is why it is used everywhere in servers. When something bad happens, you know the cause and can fix the broken component. You can also write scripts that simplify things.
However, GUI applications in Linux don't apply the UNIX philosophy, each application is an independent island itself.

Two Unix Commands, connected using a pipe

What if have the possibility to extend this concept to existant GUI applications? We will try to present some of the scenarios we have got and we will limit this preliminary study to creative tools for designers.

Scenario 1: Scripting GUI Applications

Graphical applications can only be accessed through their interface proposed by the original main developers. The devices used are mostly the keyboard and the mouse, and if an advanced user wants to do a repetitive or a specific advanced task, he needs to use macro recording tools.
Macro recording tools are not available inside every application, same external applications can solve this like "snippets" but we still need the GUI interface to be available

Why we still external tools that depends on the GUI to automate things? Isn't better that we have the possibility to automate applications without relying on the proposed interface?

Many advanced tools in Linux, like inkscape and gimp, tried to provide this: a non GUI interface that can be accessed via scripts. But sometimes this interface lacks many of the functions, and we can't see what we are doing in live until we open output files.

import dbus

bus = dbus.SessionBus()
ro = bus.get_object("org.inkscape",
print ro.ellipse(0,0,100,100)
print ro.ellipse(100,100,200,200)
print ro.ellipse(50,50,150,150)
print ro.select_all()

Using a simple DBus API like above, we have all the benefits united, we can access the application from outside, script it, and we have the possibility to see the result live on the application canvas, we can modify script options, or directly manipulate objects for operations that is easy to perform without a script. DBus has many language bindings, so we can use any interface we want.
Going in the direction of a DBus solution, is like opening a Pandora Box.

Scenario 2: Interactive programming

In the previous scenario, we speak about scripting applications, but what if we combine some advanced script, add a GUI interface, and make them configurable ?
We have then the possibility to create "plugins", but which lay in another process, written maybe in another language, and can do a lot more things.
We can write a small script, see the result on the canvas, and modify parameters values to eliminate developers blind coding.

Interactive Coding applied on Inkscape objects. We can see the result directly when modifying objects properties.

More advanced tools can emerge if we add operating system architecture knowledge, like "magic lenses" concept. In this example we know many things about the application including its window, its objects, its functions, we provide semi-transparent window that can modify inkscape internal objects.

If we push more this concept, we can force a simple drawing application to act like a graph plotting app. We can even use it as an animation tool by modifying objects properties like position, color, size etc.

We can see the animation directly, and we can export a new image each time. The set of image can be combined to create a video or any animation.

This animation is created using inkscape by modifying stars properties, exporting each frame, and finally creating a full animation from the set of images.

Scenario 3: Factorisation of application programming

More than creating these advanced plugins for application, we can push the concept to the extreme. What if we eliminate the application default GUI and have the ability to show another interface we want and which is only connected to the core app through dbus calls ?
This may make some of you think about the ubuntu HUD
We can use an application using another interface and providing more functionalities, by taking its menu in the case of ubuntu HUD out of it, and communicating through DBus.

Solving tools inconsistency

Why in hell we need such a thing? A first argument is by asking the closest designer you know about the tools he uses. Most of times he will cite a lot of tools, most tools share a common base of functionalities (which is drawing..) but each one add other functions, like sketching templates, exporting a mockup scenario, vector drawing, animation making, ...

CAD Tools, mockup tools, animation tools, proposes the basic functionalities of drawing, in addition to a more advanced ones depending on the final use.

What if we can be able to share functionalities in a core-app, but make the interface pluggable depending on the use ? If we want to draw, we show a simple drawing GUI on top of the canvas. If we want sketching we show another one with sketching templates. If we want to animate, we show a timeline of animation.
The benefit here is that a huge amount of code is shared and developers will need to write less code.

Solving one device inconsistency

Designer use a lot of tools to do their job, but most of them use a creative suite built by one company. This means that the interaction logic behind the tools is almost the same. The icons are also the same. If someone use one tool, he won't be lost using another.
In the free/open source world, tools are created by different communities. And so are the icons, the interaction design, the logic etc. If you learn a tool, you really need to invest another big amount of time learning another. So the problem here rely in the interface and the interaction.
What to do if we want to solve this problem? A company which builds the platform and wants to invest in this, can hire a designer to create GUI for a set of applications, and they will share the same logic. Linking the interfaces with applications should be very easy if they export all their methods on DBus. It will be a functionalities-matching job.

I want here to cite libdbusmenu by Canonical, they have sorted to grab any application menu, show it where they wants, and they use other rules for matching like searching for the menu name. In recent releases they added "a scope" to provide fuzzy matching. As they have do this only for menus, they can do the same for more internal functionalities.
Getting rid of the main GUI interface and providing new ones by the company is the extreme case of such scenario.

Adding more input options

Some time ago, we wanted to add multitouch functionality to Ubuntu Maverick. At the time, a lot of things needed to be developed, we have some devices supported and emitting events from the Kernel, but applications just ignore all of them.
We had really a lot of discussions on what's the best way to route events through the layers of X then through libraries until reaching applications. We have to think about the raw multitouch events as well as gesture events. At the time, a quick solution to show to the world what we are doing is to develop Ginn, a gesture injection tool that works without the need of support from libraries, and without the modification of the target application itself.

The solution was quite simple:
Get gesture event, get the active app, Read the wishes of the user (configuration file), Convert these advanced events into something the application already understand: keyboard taps and mouse clicks.

That simple solution allowed us to show to the world the beauty of what we are doing. But we were limited to mouse and keyboard shortcuts. Beside the performance issues, just imagine the power of matching such events directly to internal applications functionalities through DBus.

Using hand angles detected by the Leapmotion device to move objects inside inkscape.

Solving platform inconsistency

Many users have more than one computing device, they can have a computer, a tablet, a phone, a TV. Software in each of the platforms is different. But the tasks we do on each of the platforms share a common basis of tasks.
Let's get back to creative and drawing applications example, we can use a drawing application on a desktop computer, and use another one on a tablet. The usual computer interface will be unusable on the tablet, as the input modalities and goals are different. What we still have are the application logic which stay the same: drawing, image filters, image operations (crop, resize, ...)

Do we need to create a new application for each platform? Or just one and show a new GUI and interaction model for each one? What if we want to start something on a device and complete/view it on another?

What will remain is just the canvas which will serve as the feedback for the operations, along with the included algorithms. All of the functionalities will be exposed through the DBus Software Bus.
The GUI interface will be an additional layer displayed on top of the application, it changes depending on the used platform.
Application developer will not be asked for creating specific interfaces but the platform development team, specially if they are targeting many platforms should think about this factorization of the development. No new application should be coded from scratch, but just interfaces matched to application internal functions published on the bus.

Scenario 4: Application composer

What if we have a platform that have a lot of applications exporting their functions on the bus. In some cases, a user wants to accomplish a task that use functions dispersed in many applications. An application composer is a meta-application that can accomplish this complex tasks. It connects the output of an application to the input of another in a similar way to CLI scripts. But here the "running mode" can be visible.

Concept image showing an application composer using a set of GUI applications in order to accomplish a bigger task.

I want to get data from a Calligra Sheets table, draw these data in an application, export to an image, draw a new set of data, export to another image, ... combine images into an animation. This pipeline can be abstracted and launched by an application composer which will connect applications themselves to accomplish the bigger task. 

Scenario 5: Interactive Documentation

A new way of building applications needs a new way of building documentation.Now there are two famous ways to create tutorials:
1. Record videos of a users using the software,
2. Take screenshots of the steps and write an accompanying article.

In the two cases user needs to switch back and forth between his software and the tutorial. This can be a problem for novice users as they can lost the step they are in, or fed up by pausing and playing the video each time.

What if we can create a tutorial which is aware of the current step of the user?

Application aware Documentation

In DBus, we don't only have methods export, we can also connect to signals and get more information from the application. We can create a tutorial by showing a small action to be done by the user, and then waiting for the signal of it being done. And moving to the next action.
This frees the user from switching between the application and the tutorial, and avoid being lost in a lot of information.

Generating usual documentation

We still can have the old way of documentation but that can be generated by automatically taking screenshot of steps or by my video recording. If the GUI or the icons change, we regenerated the tutorial.

Concept of recording the steps of a tutorial, using a specific GUI, and generating a video or textual information.

(This article is currently a draft that stayed at this state a long time.
Please be free to help improving the concept and idea by some brainstorming or/and critics
I may add more information that I have in an independent paper with some testing code in the coming days.)

Things that still need to be discussed:
Standardization, Drawbacks, "microcloud" (per House), ...

1 comment:

zyga said...

This sounds like COM windows did years ago and it was utterly horrible. At the same time you can look at Android intents and how it is pretty much transparent across the system (and works well)