Sunday, May 24, 2009

Python runtimes for OpenERP, Jython, Unladen Swallow, where do we stand?

Hi,

little time ago, I decided to showcase the world a proof of concept of what I assume the most mature/promising oss ERP so far - OpenERP - running on Jython, the Java based Python interpreter. I made a branch, adapting the OpenERP database connection use zXJDBC+JDBC Java Postgres drivers, escaped code that were not running on Jython yet (C libs wrappers). That attempt has been related here:

But before all, just to make it clear: I'm posting here an update of the Python runtime situation for OpenERP. That doesn't mean I consider this is a very important issue for OpenERP, it's just that there are reasons to hope for a convergence toward the Java platform over the year, nothing more. On the contrary, my only very hot topic for OpenERP now is code quality, testing and business code refactoring/cleaning. Still, that might be for an other post, I'll now only post an update about the Java convergence and Python runtime perf status.
Now, 3 months after my first post, were do we stand? There are mostly plenty of great news as I'll expose.

But again, why Jython?
1) Speed in the long run
I prefer to make it clear: for one, OpenERP v5 is now fast enough for a lots of situations. I can't see any performance issue to deploy it in large companies and it's actually being done in several already (it might not always compete in term of native features or quality in large companies yet however). And second, yes I know that in most enterprise systems, the bottleneck rather lies in the database load and in the client/server bandwith usage/latency. Now this is especially true when the language layer is fast, for instance with Java. In Python, and OpenERP especially, the language layer might not be the bottleneck, it's still common to see 30% of CPU usage eaten by Python. So any significant improvement here will also help.

And SQL requests of OpenERP are already quite optimized in v5. For instance they try as much as possible to handle n records in an O(1) or O(log(n)) time. They also have database transactional cache thanks to their fields.function + store=True/Hash of invalidation trigger feature. They also model SQL hierarchy using MPTT. So at then end having a faster Python runtime, whatever it is will certainly help.

2) Enterprise support
Speed is one thing, but an other thing that Jython would bring to large corporations is: easy enterprise integration. In Jython, any Java class/lib can be seamlessly invoked. And Java, thanks to its design by comity mantra has at least all sort of mature features the enterprise world like to rely on: good SOAP integration (we recently had some pain with the Python Soappy implementation on the contrary), standard ESB's, JCR, JDBC, JMX, BI tools, ETL's...


Google Unladen Swallow is likely to provide the best short and mid term performance:
Still, OpenERP is probably to enjoy the best short and mid term performance thanks to Google commitment with their 'Unladen Swallow' branch of CPython. In an interview, the creators of Unladen Swallow explained that Google has a lot of "legacy" code where the high level plumbing is done in Python while the low level algorithms are done in C++ and invoked via Swig wrappers. So they said they would certainly have look in Jython for long term performance, but they also need short term performance for standard CPython and that's why they launched the Unladen Swallow project.

Unladen Swallow, wants to bring a simple JIT (Just In Time) compiler to Python and do other standard optimizations. They claim they could rich 5x perf increase by the end of 2009. Those improvements will be merged back in the standard CPython distros. So far they already have a solid 20% perf improvement. They might even try to remove the Python GIL efficiently.

In the long run, even if they relied upon the LLVM, that's however unlikely they build a Virtual Machine as sophisticated as the Java one (often considered as the best VM, especially now that they added support to dynamic languages with the InvokeDynamic JSR). So if Jython then focus on performance as they announced at PyCon 2009, Jython might win the perf war in the long run. Anyway the enterprise integration I was talking about would only be achieved on Jython.


OpenERP on Jython might be a reality by Q1 of 2010.
Yes, several very good news allow me to hope this could be done. Let's enumerate them:
most of the C libs wrappers will be avoided, Tiny, the editor of OpenERP took that commitment:
- Turbogears for the web-client layer -> done! The full blown Turbogears is no more required, on the web-client trunk branch (already very stable), Tiny completely removed the Turbogears dependency and only depends on CherryPy (pur Python) now. At the discretion of few Jython bugs, OpenERP web-client should now run unchanced on Jython.
- mx.DateTime: Tiny took the commitment to remove it in the very next months!
- the minidom XML lib: Tiny took the commitment to replace it by the Etree lib which is implemented on Jython. This as already been started in community and editor branches:
https://code.launchpad.net/~ajm-tech/openobject-server/server-50-lxml-fields-view-get
https://code.launchpad.net/~openerp-commiter/openobject-server/pap-etree-trunk
- libxml usage will be removed too.
- I investigated a bit around ModJy, the Jython standard Java servlet wrapper, and it's should be quite easy to put the OpenERP server on ModJy using a WSGI wrapper.


The best is what is being done at the database/ORM level: OpenERP is moving to SQLAlchemy!
My work on the Jython branch has been to workaround the psycopg2 native driver and replace it by zXJDBC, the Jython wrapper for JDBC + Python DPAPI2. Well the good news is that OpenERP is going away from psycopg2. They will indeed move all the SQL generation logic to the standard and awesome SQLAlchemy ORM. And SQLAlchemy is Jython compatible out of the box. That's great to see them coming back to standards (SQLAlchemy was not mature when OpenERP was already in business).
Tiny already began that work on a dedicated branch:
https://code.launchpad.net/~openerp-commiter/openobject-server/server-sa
and I encourage the community to help them do it right. Also notice that they already used SQLAlchmey for SQL generation in their emerging BI OLAP cube.

SQLAlchemy is not only a good news for Jython support. It also means that OpenERP domain (SQL filtering) logic is going to become more subtle and powerful. Currently, that part of OpenERP was not as the level of the best ORM's around such as SQLAlchemy or Hibernate. Even ActiveRecord (Rails) was doing a smarter joint job. Still, among the mature ERP's it was probably the best available, years ahead Compiere or Openbravo with their millions of pure pl/SQL legacy code (Openbravo recently supported Hibernate in their 2.50 platform, but their business code remains millions pl/SQL code lines lurking in XML CDATA statements and will still take years to be migrated to Java if ever it's to be achieved).

Oh and yes, That SQLAlchmey transition will support database independance as there are SQLAlchemy adapters for most of the market SGBD's. Especially, MySQL and Oracle will be supported. That's great because it will strengthen the OpenERP community. In some organizations Oracle is not an option (and it can arguably provide marginally superior perf), while MySQL has a much larger commuity penetration than PostgreSQL.

Unlike most large software projects, OpenERP technical quality improves slowly but keeps improving
With all that refactoring being done, OpenERP belongs to those rare kind of compex enterprise projects where the code improves over the time rather than those going desperately to chaos as businessmen add layers of crap over crap as the commercial deals are made. Also beyond the ORM transition and the webclient refactoring (removed Turbogears+changed for Mako templates), Tiny also rewrote their reporting engine an just allowed the Mako engine (a powerful standard).

My vision of software engineering is indeed pretty much a thermodynamic vision: code quality has to improve over the time much like a living system should not generate entropy in its scope. Design mistakes can very easily be done along the road and once tons of business code is build on non optimal models, then you are all screwed up: any energy you will invest in trying to correct a part of the system will actually result in a larger energy waste, pretty much like if you would like to refresh your home by letting your fridge door open: all you will get is actually get it warmer.

Why this claim? Because in business driven software, it's always easier in the short term (the only business scope as the current economic crisis teaches us), to make money supporting your existing customer base. Meaning that once mistakes are done, it's always too expensive - and thus never attempted - to re-think the core abstractions. Instead, those companies tend to make several abstractions cohabit, prefer invest in building chaotic non abstracted code instead of re-factoring the concepts that could be abstracted in the core platform. And once you have millions of lines of chaos code, you are never going to factor them back into clean/intelligible/maintainable concepts. That's pretty much how the money has been wasted in today's proprietary ERP's and even a in few claimed open ones. So they might have lot's of working features appealing to collapsing traditionnal industries, they are facing exploding maintenance costs and can't adapt further to the new emerging business as will the few viable oss ERP's do.


So overall, by early 2010, I think we should be able to see an OpenERP distributed also as a war archive you'll deploy in one click on your enterprise webserver such as Glassfish. This might shift OpenERP adoption from small SME's to large organizations which is always good because it will fuel more heavy engineering inside the project. And at the end, even the small businesses will benefit the resulting quality/usability improvement. Still I don't expect extra perf from Jython before Q2 2010 at least.


OK, enough said about Jython and Python runtimes. All that enthusiasm should also not shadow my very urgent press to Tiny to increase their quality (less bugs, less regressions). An efficient way to achieve this will be by taking the test first approach way more seriously than they have been doing so far. They just took a commitment to this too at their May Community meeting so we will see. Given what they did in the recent past (transition to international, transition to an open distributed forge, English documentation, transition to an editor business model, performance issues fixes...), I hope they will manage that challenge too.

Tuesday, March 03, 2009

magento-openerp-synchro module: we moved to Launchpad!

Sure Google Code Hosting is great,

But we are now seeing too much interest in that connector poping all over the world to continue the development in a centralized way. Also, we have been contacted by Jordi Esteve, the primary extern OpenERP contributor (after Launchpad stats) who wanted to jump in and join efforts, so he also asked us to move the contribution process forward.

That's why we have the pleasure to announce you that we migrated our development trunk from Google Code SVN to Bazaar on Launchpad, meaning we just use the advised OpenERP community advised distributed development platform. Also notice that we took care of migrating the previous SVN commits.

That was also the occasion to split the development in two branches:
  • a 4.2 compatible maintenance Branch. That one is for the conservative folks who can't afford migrating to OpenERP v5 https://code.launchpad.net/~openerp-commiter/openobject-addons/4.2-extra-addons
  • a 5.x branch where all the new interesting stuff is expected to happen: https://code.launchpad.net/~openerp-commiter/openobject-addons/trunk-extra-addons
In both cases, the OpenERP addon module name is now called magento_openerp_synchro.
The last SVN version (revision #27) has been ported as bzr revision #3581.1.9 on v5 branch and rev #148 in 4.2 branch. Later on both branches got a fix for issue http://code.google.com/p/magento-openerp-smile-synchro/issues/detail?id=31
v5 branch got even more attention and code clean up.

So you are very much welcome to jump in and contribute to whatever branch you need. Still, before adding any serious extra feature, we would like to insist that some refactoring should be undertaken BEFORE bloating the code.

Especially:
  1. Too much code is lying inside the OpenERP wizard layers. That was a bad design choice making it hard for additionnal third party modules to extend the connector behavior and add custom features that way OpenERP allows it (thanks to its extensive OOP design). Instead, that code should be moved INSIDE THE OBJECT LAYER. That shouldn't be to hard, it's only about extracting methods, putting them in the appropriated objects (sale order, product...) and calling those methods from the wizards instead. For instance we deployed a derived version of that connector for a customer along with the sale_supplier_direct_delivery module and we had to hardcode the connector to make it account for the direct delivery eventuallity. With the new design things should work more seamlessly. See the following tracker: http://code.google.com/p/magento-openerp-smile-synchro/issues/detail?id=32
  2. The sale order push feature (from Magento) has not beeing integrated properly yet and is still polluting the code. It has been a wonderfull contribution by Charles Galpin, but we had not time to integrate it the way we wanted to and it has not been tested extensively. Instead, we would like to remove the push code from Magento and instead makes Magento call OpenERP and tell him to pull the given sale order reusing it's standar sale order import code. Finally that design should be able to deal with possible network or OpenERP failure and flag failing push inside Magento for later processing. Not at top priority, but something you should be aware of. See following tracker: http://code.google.com/p/magento-openerp-smile-synchro/issues/detail?id=33
  3. Finally, we insist that we din't take care yet of exposing ALL OpenERP XML/RPC server webservice the standard Magento way. One of the consequences is that currently importing sale orders is not secure unless you block external connections by IP at say an Apache level. A much better way would be to properly expose our custom extra webservice the Magento way, see the following tracker: http://code.google.com/p/magento-openerp-smile-synchro/issues/detail?id=6

Meanwhile we welcome Jordi Esteve as a core contributor + project member. We know that things have been a bit slow with Smile and the connector recently (we are flooded by OpenERP non Magento demand), but we really hope to help moving foward as much as we can. We really hope all those inputs will streamline the installation process + natively supported features while minimizing the required development skills. Enjoy!

Sunday, March 01, 2009

New planet for Magento - OpenERP synchro module

Hi, I'm Raphaël Valyi, one of the main author of the synchro module between OpenERP and Magento. Some 10 months ago, after completing an extensive study about open source ERP's, I convinced Smile, my employer, to support that synchro module allowing Magento and OpenERP to work together. Some two months later, thanks to Smile and the tireless work of Sylvain Pamart, we got that connector in pretty good shape already.

Over the time, we improved it and tried to integrate community contributions as much as possible.

Since then, our OpenERP schedule at Smile got totally loaded with customer contracts letting us little time for the community even if Smile is still commited to support the effort in theory (my boss agrees on a 2 man days per month effort so far, as long as we have no higher priority, and that's the catch) . So unfortunately, it means that the connector is always lagging behind direct commercial activities.

Anyway, as I do personally see interest in pushing that connector forward, I'm taking on my spare time to give you updates about the connector in this blog, under the magento-openerp-synchro tag, so it get exported to our brand new Connector Yahoo Pipes powered planet mashup: http://pipes.yahoo.com/pipes/pipe.info?_id=7mpa3ckG3hGCuJSWPxJ3AQ

Ideally, I'll let any contributor (including Smile.fr) post in that planet when they contribute in the future. So the corporate work supported by Smile.fr will hence probably be published both here and on the corporate blog, that is http://blog.smile.fr/le-blog-des-consultants.

So I hope this planet will be a fair place to give updates about the connector while accommodating corporate, personal and community contributions. Feel also free to comment in the respective planet blogs.

Finally note that this planet is managed as a Yahoo Pipes aggregator wrapped inside a Google Gadget eventually. You can contact me to have your blog (+specific tag) included in that planet if you wish.

Raphaël Valyi

Sunday, February 01, 2009

OpenERP running on Java (Jython) - ROUND 1

Hi folks,

Looking for an open source ERP that doesn't suck? Well there is at least one: OpenERP. Like it or not it's Python based. For most of the things this is a perfect fit, but let's face it, Python has less enterprise acceptance and the lower layers of that ERP could also have been powered by a powerfull virtual machine. And since building a new ERP can't really take less than 3 years, the situation will hardly evolve any time soon, so we better deal with it: I bet my money OpenERP will now take the market by storm, I don't see anything stopping it to take the lead, at least for the next 5 years.

So I wanted to show the world that it was possible to back OpenERP by the Java platform, much like it's now being admitted that Java is the best bed for Ruby on Rails applications thanks to JRuby. I built a proof of concept. Now I hope some get the message and join efforts to make it happen. I'll hardly finish that journey in my spare time, but hopefully, some people start to release how important this is and how close we are and help covering the extra miles to get it in production running on Java. Of course, OpenERP doesn't need to run Java to be the best ERP already, it is just that it could be even better. And if you want to help out, I'll be there too.

what about the claimed Java based ERP's?

I've been myself a Java programmer since around 2003, even created my first open source project on Sourceforge back in 2004 (EmSim http://sourceforge.net/projects/emsim/), later on I used those skills in a variety of advanced projects when working at Amadeus and Smile.fr. But, like it or not, after deep investigation, my conclusion is that the best open source ERP was coded in Python. I've looked deep inside Java based or claimed ERP's such as Ofbiz, Neogia, JFire, Compiere, Openbravo, Adempiere but those were too limited in my opinion. Basically, the trouble is that those 'Java' based ERP don't rely on true object oriented modeling since the ground up, some of them being even more SQL based then really Java based. I could explain in deep details why, but this is beyond the scope of this post.

So my conclusion was that even if I needed to learn Python, I would get the work done faster with OpenERP than with any of those existing Java based ERP's. My experiments proved me that I was right. With OpenERP, extending the relational model, the forms and making it fit to my needs was faster than with any of the other tried ERP's, even while being a Python noob. One year later, with a few successful implementations behind me, I'm only recommending it more than ever.


But wouldn't it be great to port OpenERP on the Java platform?

The benefits of porting OpenERP on Java are beyond the scope of this post. Still, here is a short list:

  1. Noticeable speed up. Even if the Python layer peaks up at 40% of the CPU load under high concurrent loads, getting twice as fast would be noticeable. If Jython get as optimized as JRuby which is 2 to 5 times faster than C Ruby, we will go there.
  2. Real multithreading and less memory use under huge loads (CPython scales using process and sharing nothing accross runtimes because it has a Giant Lock in its interpreter, much like PHP or Ruby). The expected benefit there is much like what benchmarks prove for heavily loaded JRuby on Rails applications.
  3. One click installation on lots of hardwares (imagine how easy it will be to sale to large companies when you will say it's only a .war you pu on your Tomcat server).
  4. Penetration in large companies where Java tends to rule more than Python
  5. Easier and solid cross database support thanks to the unified JDBC API.
  6. Larger community for OpenERP. OpenERP is certainly the open source ERP with the largest active open source community. But just imagine how large it could get when you managed to connect with the Java community, telling them: hey, we are somewhat Java based too. If you know Java you'll be able to leverage those skills to get the job done in OpenERP. Again, JRuby proved here that connecting communities was possible.
  7. Possibly larger fundings and exposure for OpenERP. Openbravo certainly had the large fundings it had by playing on its Java image which as a broad acceptance in the enterprise world.
  8. And the best of all: cross language implementation: by leveraging Jython, you would be able to call the best existing Java lib (there are some good ones like JDBC, JCR, JMX, JMS, ESB's, SOAP...), but also externalize some code pieces where speed is critical to Java, meaning an easier way to reach C speed (a warmed up Java6 runs at C speed and sometimes faster) than writing C bindings.
  9. Jython will probably ends up implementing the MOP (Meto Object Protocol) that makes it possible to share OBJECTS between JVM based languages such as JRuby, Javascript, Scala, Clojures, Groovy... Meaning that more people could adapt OpenERP to meet their needs using the language they already master. That means you could code OpenERP modules in Ruby for instance and that's not a small thing, while preserving backward compatibility. Of course, published modules would have to keep beeing closely controlled, but at least one shot veticalization modules could offer more options.
  10. Because it's fun and we can do it.

What does it take?
Well, Jython is really alive again. Notice that Microsoft hired the orginial Jython creator to create Iron Python on the .Net platform. Next, by 2008, Sun Microsystem hired the two Jython leads, Ted Leung and Frank Wierzbicki to make it Python a first class citizen on the Java platform again. Espcially, JRuby recently proved that the JVM was able to run dynamic languages faster than their native C interpreter that don't come up with such a sophisticated virtual machine. So in theory the future is bright. A 2.5 Python compliant Jython is expected by February and the current trunk largely reflecting it already.

Still, the trouble, comes with native libraries. Since Python tends to be somewhat slow, Pythonistas tend to back lot's of Python libraries by C extensions to provide extra speed. The trouble is that the Jython interpreter, sandboxed in its virtual machine environment can't invoke the same C extension, mostly because it doesn't work the same at the lowest levels.

In some places, OpenERP uses such libraries. Some of them are psycopg2, mx.DateTime, libxml2, libxslt. Overall there aren't too many fortunately so that's why this shouldn't be too hard to have a full blown OpenERP running on the JVM. The common strategy here is to set up a wrapper over existing Java libraries in place of those C extensions. Also notice that in the near future, Jython might also support CTypes, a standard way of building C extensions for Python. This is largely because Jython is benefiting here from the fantastic JRuby work of Wayne Meissner around JFFI (Java Foreign Fuction Interface). Other synergies exist, like the Da Vinci Machine, or Invoke Dynamic bytecode instruction that would help the JVM speeding up dynamic languages while optimizing memory usage.



Current status:

After some two days of heavy hacking, I'm getting:

  • OpenERP server starting powered by the last Jython Java based interpreter, backed by the regular JDBC Postgres driver over a standard Java JNDI database connection pooling, instead of Pyscopg2.
  • It answers most of the webservices calls, be it from the clients or from the webservice API. The largest thing that isn't working here is datetime operation that are using the mx.DataTime library that isn't yet fully supported.
  • Also, views don't work yet (the fields_view_get method). This is due the current XML and XPath limited support. Everything need is available on the Java plateform, but it should be wrapped properly to fake the Python API OpenERP is expecting there.



Next stages:

  • I guess that faking the required xpath and xml API would be great. That shouldn't be that hard given that OpenERP only use a few things in those API. We should only wrapp them over Java libraries such as Xalan.
  • As for the mx.DateTime trouble, the OpenERP team said they are ready to move away from it. Indeed mx.DateTime has a bug with dates before 1970. That's why the Tryton fork moved away form it, proving the thing is possible (they only use it marginaly for parsing, but alternative can be found). Advised by Jim Baker from Jython, it appears that the standard python datetime, datetime.timedelta and the http://labix.org/python-dateutil library might be used instead. So aside from helping getting rid of mx.DateTime, we probably won't have much work here.
  • Once all that is working, the Java based OpenERP server will already be in pretty good shape to be used in production.
  • Of course, ideally we would package the whole thing within a standard J2EE war package and serve it with a standard servlet container such as Tomcat or Glassfish. The idea there is to write a servlet connector that will use standard Java servlet instead of the python SimpleXMLRPCServer. Then we would route HTTP request to a pool of Jython runtimes and the appropriate OpenERP server layers, much like others do with Django on Jython or the Glassfish JRuby on Rails gem supported by Sun Microsystem.
  • Once performance starts to proves better than CPython (Jython and JRuby guys are expecting this to happen by 2009 already) and once large companies show interrest in deploying OpenERP over their existing Java stack, may be the OpenERP team starts to support it officially and provide alternative Java based layers to increase performance even more or provide extra features (like SOAP, JMS async messaging, ESB like Mule or ServiceMix, a REST layer, JMX remote monitoring, OpenTerracotta clustering...)



Steps to follow to test the current Jython powered version:


Requirements:

  1. you should know how to start a regular OpenERP v5 server
  2. you should have an existing OpenERP v5 database to test again
  3. you should have Java installed (1.6 advised)


First you have to grab a recent Jython interpreter. Make sure you have a recent version of Java installed (1.6 advised) and then grab Jython:
get a fresh version of Jython, and build it:


$ svn co https://jython.svn.sourceforge.net/svnroot/jython/trunk/jython/ jython-dev
$ cd jython-dev
$ ant

now, put the jython-dev/dist/bin/jython command in your path.
now if you type

$ jython

that should bring you a Python 2.5, java based commande line interpreter.

Now grab my public branch of the OpenERP server:

$ bzr branch lp:~rvalyi/openobject-server/trunk-java-jython


As for the addons, you can use your regular OpenERP v5 addons, or grab them from here:

$ bzr branch lp:openobject-addons


Now open the trunk-java-jython/bin/tools/config.py and make sure you properly set up the database connection params and you addons path location.
May be it's better to leave the db_name to False so it will load all the features lazily when requested.


you can already see what happens if you:

$ cd trunk-java-jython/bin
$ jython openerp-server.py


You'll probably have errors because we didn't properly set up some required libraries.
So it's time to copy paste a few libraries from our regular Python path to our Jython path:
you can copy/paste the following directories:
pychart, pytz, reportlab
from
/user/lib/python2.5/site-packages/ (or whatever you Python sys.path)
to [...]/jython-dev/dist/Lib

There is one more catch though. Currently, there is bug in Jython preventing the reportlab lib to load fully. This is because font listings in the pdfbase/_fondata.py are just too large for the JVM spec to fit in a single Java method. The Jython folks are aware of that bug but it's low priority: see http://bugs.jython.org/issue527524
Meanwhile, helped by the fellow Charles Headius Nutter from the JRuby Sun team who faced similar issues implementing JRuby, I patched the file to make reportlab works in Jython. It's only a wrapping of large list instantiations and doesn't remove any feature. I submitted it to the reportlab developers and hope to see it included waiting for a Jython fix.
So in any case, for now, you should just move the _fontdata.py file provided in the trunk-java-jython branch and copy it in place of the reportlab/pdfbase/_fontdata.py file of your Jython reportlab installation.

Now you should be able to start you OpenERP server on Jython. Look at the following logs on my machine. Then I connected using XML/RPC with my GTK client (in that development version I'm printing all the requests, you could remove those print of course):

rvalyi@rvalyi-laptop:~/DEV/openobject_trunk/openobject-server-jython/bin$ jython openerp-server.py
[lun. févr. 02 2009 02:41:22] INFO:server:version - 5.0.0
[lun. févr. 02 2009 02:41:22] INFO:server:addons_path - /home/rvalyi/DEV/openobject_trunk/openobject-addons
[lun. févr. 02 2009 02:41:22] INFO:server:database hostname - localhost
[lun. févr. 02 2009 02:41:22] INFO:server:database port - 5432
[lun. févr. 02 2009 02:41:22] INFO:server:database user - openerp
[lun. févr. 02 2009 02:41:22] INFO:objects:initialising distributed objects services
WARNING; Python Imaging not installed, you can use only .JPG pictures !
[lun. févr. 02 2009 02:41:32] INFO:web-services:starting XML-RPC services, port 8069
[lun. févr. 02 2009 02:41:32] INFO:web-services:starting NET-RPC service, port 8070
[lun. févr. 02 2009 02:41:32] INFO:web-services:the server is running, waiting for connections...
list
()
-------------
[lun. févr. 02 2009 02:41:40] INFO:dbpool:Connecting to template1
executing the following DB query:
select datname from pg_database where datdba=(select usesysid from pg_user where usename='openerp') and datname not in ('template0', 'template1', 'postgres') order by datname
query passed!
[lun. févr. 02 2009 02:41:40] INFO:dbpool:Closing all connections to template1
['jython', 'openerp']
server_version
()
-------------
5.0.0



So the GTK client can connect to the Jython powered server and list the available databases for instance.
But if I try to really connect to the database and go further here I get an error unfortunately:

[...]
SELECT ir_act_window_group_rel.gid,ir_act_window_group_rel.act_id FROM ir_act_window_group_rel , res_groups WHERE ir_act_window_group_rel.act_id in (1) AND ir_act_window_group_rel.gid = res_groups.id order by res_groups.name offset '0'
query passed!
{'view_id': (1, u'ir.ui.menu.tree'), 'src_model': False, 'view_ids': [], 'context': u'{}', 'view_type': u'tree', 'auto_refresh': 0, 'usage': u'menu', 'res_model': u'ir.ui.menu', 'domain': u"[('parent_id', '=', False)]", 'name': u'Menu', 'id': 1, 'view_mode': u'tree,form', 'target': u'current', 'views': [(1, u'tree'), (False, u'form')], 'limit': 80, 'type': u'ir.actions.act_window', 'groups_id': []}
execute
('jython', 1, 'admin', 'ir.ui.view', 'read', [1], ['model', 'type'], {'tz': False, 'active_ids': [], 'lang': 'en_US', 'active_id': False})
-------------
executing the following DB query:
SELECT "model","type",id FROM "ir_ui_view" WHERE id IN (1) ORDER BY priority
query passed!
[{'id': 1, 'model': u'ir.ui.menu', 'type': u'tree'}]
execute
('jython', 1, 'admin', 'ir.ui.menu', 'fields_view_get', 1, 'tree', {'tz': False, 'active_ids': [], 'lang': 'en_US', 'active_id': False})
-------------
executing the following DB query:
SELECT arch,name,field_parent,id,type,inherit_id FROM ir_ui_view WHERE id='1' and model='ir.ui.menu'
query passed!
executing the following DB query:
select arch,id from ir_ui_view where inherit_id='1' and model='ir.ui.menu' order by priority
query passed!
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[01]: Traceback (most recent call last):
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[02]: File "/home/rvalyi/DEV/openobject_trunk/openobject-server-jython/bin/osv/osv.py", line 60, in wrapper
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[03]: except orm.except_orm, inst:
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[04]: File "/home/rvalyi/DEV/openobject_trunk/openobject-server-jython/bin/osv/osv.py", line 120, in execute
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[05]: cr.commit()
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[06]: File "/home/rvalyi/DEV/openobject_trunk/openobject-server-jython/bin/osv/osv.py", line 112, in execute_cr
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[07]: File "/home/rvalyi/DEV/openobject_trunk/openobject-server-jython/bin/osv/orm.py", line 1082, in fields_view_get
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[08]: doc = dom.minidom.parseString(encode(result['arch']))
[lun. févr. 02 2009 02:44:30] ERROR:web-services:[09]: NameError: global name 'dom' is not defined




So yeah, not everything is working yet, and especially the view layer don't work yet because we didn't provide appropriate wrappers for the xml/dom and xml/xpath Python modules used by OpenERP.

The same goes withe the web-client (eTiny), it can list the databases but won't show any view properly yet.

Still, more something interesting is that webservices, which only care about the model and controller layers are already working quite well. See yourself, in an standard python or jython console, I can for instance read all the info related to a given product (here with id 1):

>>> sock = xmlrpclib.ServerProxy('http://localhost:8069/xmlrpc/object')
>>> sock.execute("my_test_base", 1, "admin", 'product.product', 'read', [1])
[{'warranty': False, 'property_stock_procurement': [5, 'Procurements'], 'supply_method': 'buy', 'code': False, 'list_price': 38.25, 'expected_margin_rate': 0.0, 'pricelist_purchase': 'Default Purchase Pricelist (0.00) : 25.50\n', 'incoming_qty': 0.0, 'weight_net': False, 'standard_price': 25.5, 'cost_method': 'standard', 'active': True, 'price_extra': 0.0, 'mes_type': 'fixed', 'uom_id': [1, 'PCE'], 'uos_id': False, 'ean13': False, 'default_code': False, 'type': 'service', 'property_account_income': False, 'qty_available': 0.0, 'sales_gap': 0.0, 'id': 1, 'expected_margin': 0.0, 'uos_coeff': 1.0, 'virtual_available': 0.0, 'seller_delay': 1, 'total_cost': 0.0, 'purchase_ok': True, 'date_from': '2009-01-01', 'property_stock_account_output': False, 'track_outgoing': False, 'company_id': [1, 'Espace Loggia'], 'product_tmpl_id': [1, 'Onsite Senior Intervention'], 'state': False, 'loc_rack': False, 'pricelist_sale': 'Public Pricelist (0.00) : 38.25\n', 'uom_po_id': [1, 'PCE'], 'price_margin': 1.0, 'price': 0.0, 'property_stock_inventory': [4, 'Inventory loss'], 'loc_case': False, 'sale_avg_price': 0.0, 'description': False, 'track_incoming': False, 'property_stock_production': [6, 'Production'], 'purchase_avg_price': 0.0, 'weight': False, 'supplier_taxes_id': [], 'volume': False, 'normal_cost': 0.0, 'outgoing_qty': 0.0, 'dimension_type_ids': False, 'date_to': '2009-12-31', 'procure_method': 'make_to_stock', 'sale_num_invoiced': 0.0, 'variants': '', 'partner_ref': 'Onsite Senior Intervention', 'loc_row': False, 'purchase_num_invoiced': 0.0, 'sale_ok': True, 'rental': False, 'packaging': [], 'sale_delay': 7.0, 'name': 'Onsite Senior Intervention', 'total_margin_rate': 0.0, 'description_sale': False, 'property_account_expense': False, 'categ_id': [8, 'All products / Sellable / Services / Onsite Intervention'], 'invoice_state': 'open_paid', 'property_stock_account_input': False, 'track_production': False, 'sale_expected': 0.0, 'lst_price': 38.25, 'taxes_id': [], 'dimension_value_ids': [], 'produce_delay': 1.0, 'seller_ids': [], 'description_purchase': False, 'turnover': 0.0, 'purchase_gap': 0.0, 'product_manager': False, 'total_margin': 0.0}]




OK, enough said for now. I'll try to post updates on the dedicated Launchpad blueprint: https://blueprints.launchpad.net/openobject-server/+spec/jython-support-as-jython-improves
Don't hesitate to contact me if you want to push this work further.

Raphaël Valyi.

Saturday, January 24, 2009

Java 7 build 44 improves jar build speed a lot indeed

Hi,

from the changelog of jdk7 build 44, one can read that they improved on the speed of the jar task.

Indeed, I can confirm this for building JRuby which is quite a large project:

Before, Java 7 build 43:
>ant-jar complete
[...]
BUILD SUCCESSFUL
Total time: 1m30 seconds


Now, Java 7 build 44:
>ant-jar complete
[...]
BUILD SUCCESSFUL
Total time: 56 seconds


Quite enjoyable indeed!

Friday, July 11, 2008

OLAP component: Mondrian+JPivot or Flex OLAPDataGrid ?

Recently I've been spending a few hours digging into OLAP cube components (a special table to analyse large and multi-dimmentionnal data in Business Intelligence). I wanted to produce a nice demo with an OLAP cube plugged to the OpenERP (by far the best open source ERP) database.

I actually saw a guy demoing such an OLAP Flex component along with Openbravo ERP (looking nice but the details were provided): http://opensourceerpguru.com/
Anyway I just wanted to achieve the same but using OpenERP (I feel much more comfortable with it's elegant and efficient architecture, not to tell about the features nor the business model) this time and eventually JRuby on Rails, my best Swiss knife to pull the data to the OLAP viewer.

So at first I've been stunned by the OLAP Flex component (called OLAPDataGrid) . It looks really nicer than the old fashion Mondrian JPivot. So I decided to give it a try. First time with Flex and back to the half open source crapp since long ago. So I had to follow all the Adobe crapp flow: create a fucking account, read all their commercial crapp, agree with their whatever license and finally download the stuff and start playing with it. Then I remarked that all my FlexBuilder nice OLAP samples were coming with a "Flex Data Vizualisation Trial" watermark. OK, time to remeber, FlexBuilder is not open source yet, so let's go with the Flex SDK, back to the Adobe legacy crapp, download again and try again (I heard that the SDK was open source, or sort of).

Then I tried to compile my mxml component with the following command line:

flex_sdk_3/bin$ ./mxmlc /home/rvalyi/DEV/olap_test/src/olap.mxml
Loading configuration file /home/rvalyi/Desktop/flex_sdk_3/frameworks/flex-config.xml
/home/rvalyi/DEV/olap_test/src/olap.mxml(162): Error: Could not resolve to a component implementation.

id="myMXMLCube"

/home/rvalyi/DEV/olap_test/src/olap.mxml(195): Error: Could not resolve to a component implementation.


WTF ???

Googled the error message and got the official answer from an Adobe employee on a forum here: http://www.codeverge.net/item.aspx?item=101509

The OLAPDatagrid component is available only in the Flex Builder Professional
version. For the first problem, the one where you are building a Flex + LCDS
2.5.1 project, the answer is that the LCDS 2.5.1 doesn't include the
OLAPDatagrid component and that is way you are getting the errors on runtime.


You f****** b*st*rds! you got me! So that how I lost a few hours trapped by the Adobe half open source policy, go hell!

Mondrian or Flex OLAPDataGrid?
Well, back to Mondrian and JPivot...

OK, for sure Mondrian and JPivot aren't really something optimal and I feel more like it's a bloated non HTTP compliant piece of code, but hey it's free and it just works. So until Tiny.be release their awesome open source "TinyBI framework" (for October?) I'll stick with it. I'll hardly try Flex and Flash again, I promise.

Finally I should say that I really don't know anyway how that Flex component would deal with a large database as it seems it's an in memory client side OLAP solution only. Even if it were to change, I'm not sure I would feel comfortable in feeding the right data pieces as the OLAP component requires them and finally all the existing samples don't come with drill down, slicer and rotation widgets, so I'm not sure how easily one can interact with the cube.

May this post save your time.

Sunday, January 20, 2008

Orkut social network community profiling with GreaseMonkey and Rails, round one.

Hi,

I wanted to figure out some statistics about some Orkut (Google equivalent of Myspace or Facebook) social communities. I want to know the mean age, the gender rate and other things among given communities. And actually I'm getting a lot more as you'll see. The method I'm showing is very simple and can be reused for other purposes. It's automatic enough to grab profiles 15 by 15 and build a good sampling of the community, but not enough to grab ALL the profiles. Anyway, I guess Google would kick out crawlers wanting to grab all the profiles.

The challenge for collecting those data is that you need to be logged in to crawl Orkut communities. Then only you can grab information in the HTML pages if you manage to handle the navigation properly. I didn't really manage to automate all the process using the HPricot HTML parser.

Instead I went for a semi-automatic combination of GreaseMonkey scripts to grab the profiles and a Rails server to store the data. Now I only need to browse every members page of an Orkut community to get all its member profiles stored inside my database. Then I also have a rails action to export data in csv so I can open it in spreadsheet editor:


The first thing is to install the GreaseMonkey Firefox plugin.
Then you'll install the following user script name orkut_crawler.user.js:

// ==UserScript==
// @name Orkut_crawler
// @namespace http://livetribune.org/
// @description grab orkut properties
// @include *
// @exclude http://diveintogreasemonkey.org/*
// @exclude http://www.diveintogreasemonkey.org/*
// ==/UserScript==

var scripts = [
'http://localhost:3000/javascripts/orkut_crawler.js'
];
for (i in scripts) {
var script = document.createElement('script');
script.src = scripts[i];
document.getElementsByTagName('head')[0].appendChild(script);
}


What we are doing here is including the Prototype Javascript library to make it easier to grab the information using CSS selectors. I tried to paste the prototype lib directly inside the browser script but it didn't worked for some unknown reason. So anyway, this last solution works.

You can now make sure the script gets activated when you visit an Orkut page:

OK, now we need to set up our actual orkut_crawler script as well as the Rails backend.
build a simple Rails app using the ">rails orkut_crawler" command line.
You can use any database. I used the default Sqlite DB in my case.

Then create the database with that command ">rake db:create" inside the orkut crawler directory.

Now, let's make a simple persistent model to store our data:
"> ruby script/generate model item"
Now edit the db/migrate/001_create_items.rb migration file and write:

class CreateItems <>

Let's create the table: ">rails db:migrate"

Now we need to create the orkut_crawler Javascript file that'll be called by our GreaseMonkey script. make a new file called public/javascripts/orkut_crawler.js

Inside that file, you first need to copy the Prototype javascript library you'll find in public/javascripts. Then write this code at the end of the file:


/* Crawler */

function appendNewScript(src_url, id) {
var headID = document.getElementsByTagName("head")[0];
var newScript = document.createElement('script');
newScript.id = 'snap';
newScript.type = 'text/javascript';
newScript.src = src_url;
headID.appendChild(newScript);
}

function addTuple(line) {
try {
var key = line.childNodes[1].innerHTML;
var value = line.childNodes[3].innerHTML;
key = key.replace(" ", "_");
key = key.replace("/", "_");
key = key.replace(":", "");
value = value.replace("\\", "");
value = value.replace("\"", "");
value = value.replace(";", " ");
value = value.replace(":", " ");
params += "&" + key + "=" + value;
console.log(key, value);
} catch(e) {}
}

var tab;

if (document.location.href.indexOf('CommMembers.aspx') > 0) {//it's a community page
tab = $$('.listitem');
for (var i =0; i<>
setTimeout(tab[i].innerHTML="<" + "iframe src='"+ tab[i].childNodes[1].href + "'/>", 100 * i );
}
} else if (document.location.href.indexOf('Profile.aspx') > 0) {//it's a profile
var params="";
tab = $$('.listlight');
for (var i =0; i< tab.length;i++) { addTuple(tab[i]); }
tab = $$('.listdark');
for (var i =0; i< tab.length;i++) {addTuple(tab[i]); }

params = document.location.search + params;

appendNewScript("http://localhost:3000/data/new" + params);//send the params back to our Rails app!



This code will grab the profile when you browse an Orkut page. If you are rather browsing a community page, then it will open all the profiles of the listed members of this page and thus grab those profiles.

Finally, we need to write a Rails controller that will persist the data (the 'new' action), render a global csv file ('index' action) and even tell how many profiles we have ('size' action). So edit app/controllers/data_controller.rb this way:

require 'cgi'

class DataController < ApplicationController
def new
params.delete 'action'
params.delete 'controller'
params.each_key {|key| params[key] = CGI.escape(params[key])}
puts params.inspect
if existing =Item.find_by_uid(params[:uid])
item = existing
else
item = Item.new
item.uid = params[:uid]
end
item.properties = params.inspect
if item.save
render :text => 'UPDATED' and return if existing
render :text => 'SAVED!'
else
render :text => 'ERROR!'
end
end

def size
render :text => Item.find(:all).size.to_s
end

def index
all_props = []
@items = Item.find(:all)
res = ""
@items.each do |item|
map = eval item.properties
map.each_key do |key|
all_props << key unless all_props.index key
end
end

all_props.each do |key|
res += key + "; "
end
res = res[0..-3] + "\n"

@items.each do |item|
all_props.each do |key|
map = eval item.properties
res += CGI.unescape(map[key].to_s).gsub("\n", " ").gsub("\r", " ").gsub("\r\n", "").gsub(";", " - ") + "; "
end
res = res[0..-3] + "\n"
end
render :text => res
end
end



OK, we are done.
Now make sure our Rails server is working:
"> ruby script/server"

You can also track what is going on:
"tail -f log/development.log"

Now visit some Orkuts community pages with this URL pattern:
http://www.orkut.com/CommMembers.aspx?cmm=10087467&tab=0&na=3&nst=151&nid=0


As you'll see, instead of the normal member's icons, we are now loading the members profiles inside iframes. At the same time, you can ensure in your terminal our Rails server is storing all the profiles.

Finally; look at the profile you collected:
see how many profiles you have collected: http://localhost:3000/data/size
Now let's export the data as csv (this can take a while depending how much data you stored):
in your terminal, execute:
">wget http://localhost:3000/data"
Now rename the data file into data.csv and browse it in OpenOffice for instance:



Ok, you are done. Now you can start making some statistics, but that's for round two!