This are my notes in the fields of computer science and technology. Everything is written with ABSOLUTE NO WARRANTY of fitness for any purpose. Of course, feel free to comment anything.

Thursday, July 23, 2009

Conversion of CR+LF to LF

The following will convert all files also in subdirectories
exept for those starting with a dot (skips also such directories).
This works if dos2unix is available, of course; otherwise use sed.

find ^.* -exec dos2unix '{}' \;

I needed it to convert the content of a git repository (that's
why I skipped the . files: I didn't want .git/* files to be included).
Of course you can write better matching chriteria, but this worked
fine for my case.

Thursday, July 16, 2009

Tiny implementation of a suffix array in Ruby

class SuffixArray

attr_reader :suf, :string

def initialize(string)
@string = string
@suf = (0..string.size-1).sort_by{|i|@string[i..-1]}
end

end

Monday, April 20, 2009

Count distinct values in all columns of a table

The following rake task will show the number of different values of each column of a given table in the database. I wrote it to see statistics on which columns are really used.
namespace :db do
desc "Count distinct values for each non empty column of a table (TABLE=xxx) "+
"optionally using a condition (WHERE=\"yyy\")"
task :count_distincts => :environment do
raise "Specify option TABLE=<table_name>" unless ENV["TABLE"]
puts "Distinct values in table #{ENV['TABLE']}:"
puts "(condition: where #{ENV['WHERE']})" if ENV['WHERE']
c = ActiveRecord::Base.connection
columns = c.select_all("describe #{ENV['TABLE']};").map{|a|a["Field"]}
columns.each do |col|
sql = "select count(distinct #{col}) as c from #{ENV['TABLE']}"
sql << " where #{ENV['WHERE']}" if ENV['WHERE']
n = c.select_one(sql)['c']
puts "#{col}: #{n}" unless n == "0"
STDOUT.flush
end
end
end

Thursday, April 16, 2009

assert.h

Assertions: you express your expectations in a certain position of the code, and this way you avoid bugs that would otherwise very difficult to trace.

In C use the standard library macro assert. You must #include <assert.h>, then you assert(expression); in the code, and if the expression is evaluating to 0 / false, the program aborts with an error msg printed on sterr that tells you which assertion fails and where it is (file/line).

It is also possible to turn assertions off, adding a #define NDEBUG before the #include <assert.h>.

Tuesday, April 14, 2009

Why main is int and not void.

The standard says so. And it actually makes a difference: here is a detailed discussion about this topic: http://users.aber.ac.uk/auj/voidmain.shtml

Why do you have to pass scanf the address of the variable to write in?

As I was learning C the first time, I remember I was asking this myself... I would have preferred it to return the scanned values, something like:

myVar = scanf("%s") /* don't do this :) */

but of course, in this case (1) you could assign only 1 variable, (2) you would have not had the return value (number of read items)...

Here is some discussion of it: ...

Turn colors on in vi/vim

If the color syntax highlighting is off, you can turn it on by editing (or creating) ~/.vimrc, adding the following line:

:syn on

Monday, April 13, 2009

square root

When I have some time, I will have a look to this algorithm to calculate the square root of a number:
    int sqrt(int num) {
int op = num;
int res = 0;
int one = 1 << 14; // The second-to-top bit is set: 1L<<30 for long


// "one" starts at the highest power of four <= the argument.
while (one > op)
one >>= 2;

while (one != 0) {

if (op >= res + one) {
op -= res + one;
res += one << 1;
}

res >>= 1;
one >>= 2;
}
return res;
}

(from Wikipedia)

Saturday, April 11, 2009

XSLT

XSLT: take an XML doc and, through a sort of "stylesheet", make something with the data (another XML doc, or perhaps a XHTML doc). I don't know this language but I guess it's rather useful (although I have no use for it at the moment).

related stuff: libxml, libxml2, libxslt, xsltproc

Links:
- specifications@w3: http://www.w3.org/TR/xslt


PragProg media: Expression Engine Techniques

I often have a look at the list of media of "PragProg" (http://www.pragprog.com/categories/all?sort=pubdate). The newest title is the screencast serie: "Expression Engine Techniques"

Well I definitely don't need it, anyway I'm always curious, so here is the info I collected (mainly from Wiki) about the topic:

ExpressionEngine http://expressionengine.com/ is a CMS, developer: EllisLab, there is a free version, and two paid ones. A "2.0" is expected in 2009, will be based on CodeIgniter, which is a PHP framework.  


capistrano using git

The cap deploy:update task when you are using git runs the following git command: 

git checkout -q -b deploy 3fe75somehash.....

meaning : 

- creates a new branch named "deploy" (what happens if there is already one called like that?)
- the source of the branch is the commit identified by the given hash
- -q option is "quiet mode"

rsync and symbolic links

some possible behaviours of rsync (from the man page):

(1) SKIP:
default case => symlinks are simply not followed

(2) COPY THE LINK:
"symlinks are recreated with the same target on the destination"
rsync --links
also: rsync --archive implies --links

(3) FOLLOW THE LINK
rsync --copy-links

safe/unsafe:

relevant for case 3 is the "safe" vs. "unsafe" difference which I did not understand good.
This is what is written: "An example where this might be used is a web site mirror that wishes ensure the rsync module they copy does not include symbolic links to /etc/passwd in the public section of the site. Using --copy-unsafe-links will cause any links to be copied as the file they point to on the destination. Using --safe-links will cause unsafe links to be ommitted altogether."

Always back up on a dedicated partition or disk!

My home dir on a certain server was backed up by another company. I set under my home a soft link to another location (several Gb of data). What I didn't know: their backup script was rync-based *with* the option --copy-unsafe-links. Rsync followed the link and backed up tons of stuff clugging the backup hard disk. They used the same hard disk and the same partition also for another function: email server. So that softlink disrupted the email server! At first I though I was to be blamed, but after a second thought, I understood, that it's not 100% true: they should have *never* run the mail server on the same partition where they backed up that stuff!


Ruby and GUIs

Is there a good ruby GUI toolkit?

I want to write a simple app, with a simple DB backend, just to keep some notes. As I want it to run on my laptop, I don't want any webserver running, so no rails app (and also because I want to do some gui programming, it's years I am only working on console or web apps). 

I started earlier this morning my Google quest to answer this question, I am not so far yet. I am no expert in GUIs. 

Some disorganized thoughts:

- Java of course is a good choice, isn't it? Yes I want to learn Java again (I learned it 10 years ago, and not using it a long time, so I guess my knowledge is totally out-of-date). But now I want just to write a little small app in that simple little lovely ruby language. 

- I guess most Win apps are based on "native" widgets. For that maybe you need VS or the like and I am in this moment in no mood to be a MS fan. And I want something cross-platform. 

- There a libraries, and like always in IT a lot of names and acronymes, just to confuse stupid newbies like me. QT is one, I think, then there are others (I think some GUI library-flame is also the reason why Gnome and KDE are 2 different desktops, isn't it?). So I guess you use one of that with some good ruby bindings and you are on it. Right?

- I read of Shoes: very simple, maybe good idea, but probably just for beginners, I don't know if I want to waste my time with it. Looks like is just something for kids learning programming? 

Friday, November 7, 2008

Thursday, November 6, 2008

app helper in the rails console

The app helper is cool. You can use it to simulate requests to your app: 

e.g.

app.get "/" 

Friday, August 22, 2008

Hide attributes in Rails / ActiveRecord

Let's assume you have a table "entities" that was created by this Rails migration code:
create_table :entities do |t|
  t.string :type
t.string :title
t.string :first_name
t.string :last_name
t.binary :logo
end
Person and Organisation will be subclasses of Entity (which of course is a subclass of ActiveRecord::Base), using single table inheritance.

If people have no logo and organisations have no title, first name and last name, maybe we would like to hide this columns from their respective classes. Of course, you may say that in this case you shouldn't use STI. But if you really want to... ActiveRecord does not provide hiding functionality. All Person and Organisation instances will see all attributes of Entity. This behaviour can be changed as follows.

When a new object is created (e.g. Person.new), this is a real instance of Person. The list of attributes comes in this case from the public class method columns of ActiveRecord::Base. When records are fetched using a finder method, however, (e.g. Person.first), ActiveRecord goes another way, and calls the private class method instantiate of ActiveRecord::Base, which takes an hash of attributes/values as argument, which conversely derive from the results of the sql query. So to hide attributes you have to hide them both in columns and in instantiate and the trick is done.

Monday, August 18, 2008

Default values in Ruby blocks

The ruby parser does not allow default values to be set in blocks, unlike in method signatures. This is of course valid ruby:
def method(a, b=0)
#...
end
but this isn't:
Proc.new {|a, b=0| } ### syntax error!
See e.g. this discussion in Ruby forum.

However it is possible to simulate the behaviour, creating a de facto signature with default values. For example lets say I want a proc accepting the same parameters as a method defined as def a(b, c=1):
lambda do |*args| # simulated signature: |b, c=1|
b, c = args[0], args[1] || 1
end
This has the disadvantage that is not validating the number of arguments, so let's add some validation code to the block:
lambda do |*args| # simulated signature: |b, c=1|
b, c = args[0], args[1] || 1
# validate number of arguments:
err = "wrong number of arguments"
if args.size > 2
raise ArgumentError, "#{err} (#{args.size} for 2)"
elsif args.size == 0
raise ArgumentError, "#{err} (0 for 1)"
end
end
Now the block behaves like if it had the signature |b, c=1|.

Sunday, August 17, 2008

ActiveScaffold and LOB sorting in Oracle

If you are using ActiveScaffold and an Oracle DB, you will notice that you can't natively sort on LOB columns. If you click on the table header for a LOB column, you will get an error message. A way to fix this is to sort using the substr function and pass it to config.columns[].sort_by :sql.

To make it simpler, I solved the matter including in ActiveScaffold::DataStructures::Column a
method:
def sort_as_lob(lenght = 50, offset = 1)
sort_by :sql => "dbms_lob.substr(#{name},"+
"#{lenght},"+
"#{offset})"
end
which I use on LOBs column declaring in the active scaffold config block:
active_scaffold do |config|
config.columns[:my_lob].sort_as_lob
end

Saturday, August 16, 2008

sort_by => :sql in and polymorphic associations

Assume you have a polymorphic association in Rails.
class MyObject < ActiveRecord::Base 
belongs_to :another_object, :polymorphic => true
end
Well, "another object" can now be in any other table of the DB, and in the my_objects table there are two columns (assuming you followed the conventions) named another_object_id and another_object_type, the first one containing the ID of the object, the second the model name.

Now let's say I have a table with all "my objects" and I want to sort it according to a specific column in "another object" (for simplicity let us assume any possible "other object" have a column called "label").

There is two ways to do this. The first one is loading all "my objects" instances, then for each one make a query, according to the association type, to find the other objects, than let Ruby sort by label. This is of course not optimized. Through the
'eager loading' ActiveRecord feature, which I think is in the meantime also available for polymorphic associations, it is probably possible to find a better way. However I wanted a single query, so I did it using SQL; case was introduced in the standard, if I am not wrong, in the SQL-92 version. I think that most SQL-DBs comply, SQL-Lite probably excluded (I tested only on Oracle).

So in my case another_objects can only be of a few types, so I did it this way, in the controller code:
#
# e.g. types = %w[Cat Dog Mouse]
#
def sort_by_sql(types)
sql = '(case (another_object_type)'
sql << types.map do |type|
'when #{type}
then (select label
from #{type.tableize} t
where t.id = another_object_id) '
end.join('')
sql << 'end)'
end
private :sort_by_sql
You can use this sort_by_sql() method in the :order_by => sort_by_sql(...) key of the find method to sort by the label method of the polymorphic association with only one query. This was sensibly faster in my case.

Actually the need for this came because I wanted to sort by a polymorphic association in an ActiveScaffold based controller, and in this case I am not sure I could have specified an eager loading without too much effort (probably overriding the finder method of the list). So I just wrote the sort_by_sql() as a class method and used it in the config block of my active scaffold:
AllowedTypes = %w[Cat Dog Mouse]
def self.sort_by_sql(types)
#...the code up here...#
end
active_scaffold :my_objects do |config|
# ...
config.columns[:another_object].sort_by :sql =>
sort_by_sql(*AllowedTypes)
# ...
end
That worked fine for me.

Thursday, August 14, 2008

Execute a Rake task in another

It is easy to make one task dependent on another, for example:

task :one => [:two, :three]

executes task two, three, then one (I have to test that the order is really this). But how to execute task ":two" in the middle of the code of another task?

Here is the solution:

desc "This task executes task two in its code!"
task :one do

# ... do domething

ENV['PAR1'] = 'xxx'
ENV['PAR2'] = 'yyy'
Rake::Task[ "two" ].execute

# ... do something

end

the ENV assignments and execute call have a similar effect to executing in your shell:

rake two PAR1 = xxx, PAR2 = yyy

Yeah, Rake is a really easy and cool tool for every scripting need...

Fake migrations for rails < 2.0.2

The following rake task can be used to update the migration pointer without actually migrate in rails databases for rails under 2.0.2. This is sometimes useful if you applied something manually or want to skip some migration anyway.

Usage examples:

rake db:pretend:migrate
rake db:pretend:migrate VERSION=20080730181045
rake db:pretend:rollback
namespace(:db) do
namespace(:pretend) do
desc "Pretend the database migrated 1 step or to VERSION=<nn>"
task :migrate => :environment do
c = ActiveRecord::Base.connection
if ENV['VERSION']
version = ENV['VERSION'].to_i
else
version = c.select_one("select version from schema_info")['version'].to_i + 1
end
c.execute("update schema_info set version = #{version}")
puts "Current version: #{version}"
end
desc "Pretend that the last migration did not happen"
task :rollback => :environment do
c = ActiveRecord::Base.connection
version = c.select_one("select version from schema_info")['version'].to_i - 1
c.execute("update schema_info set version = #{version}")
puts "Current version: #{version}"
end
end
end

oracle adapter and rails migrations

The Oracle adapter for active record has a bug that does not allow to create a dump of the schema using rake. The following code is from http://dev.rubyonrails.org/ticket/10415 and corrects this problem.
require 'active_record/connection_adapters/oracle_adapter'
module ActiveRecord
module ConnectionAdapters
class OracleAdapter
# Returns an array of arrays containing the field values.
# Order is the same as that returned by #columns.
def select_rows(sql, name = nil)
result = select(sql, name)
result.map{ |v| v.values}
end
end
end
end

Wednesday, August 13, 2008

A color picker

Looking for a color picker for a website I am preparing, I found many commercial tools and other open source (unobfuscated js) but not free-licensed.

An open source free licensed ('it’s dual licenced under Creative Commons & GPL') color picker is the Colorjack color picker: I like more the version 1.0.4 than 2.0. .

I saw that the code in Colorjack incorporates a function $() which I would have to rename to avoid conflicts with the Prototype framework... (although I didn't actually test it) so I just looked further and found Scripteka, which is a collection of Prototype extensions. Currently there are links to 119 projects.

At least two of them are color picking related, John Dyer's Colorpicker, released in 2007 under a MIT-style license and Jeremy Jongsma's GPL-licensed Control.ColorPicker, released in April 2008.

Not all the projects linked by Scripteka are free licensed: for example Cooltips is not.

Through a comment in John Dyer's blog I came to nogray color picker which is based on another javascript framework (mooTools) and I didn't find the licensing terms.

It's a pity none of them is available as Rails plugin, maybe with some nice helper methods...

Sunday, July 13, 2008

Online lectures

An Italian University (Consorzio Nettuno) is based on recorded lectures in several languages, unfortunately not online (or only for their students) but broadcasted by two channels on an european satellite. I have currently no satellite dish... that's a pity...

On the web there are anyway some resources with lectures and other videotaped matherial better than music videos and jokes. Here are some examples I found, grouped by language:

Thursday, October 11, 2007

bash startup

In interactive mode bash executes some startup scripts. First thing to know is if it is a login shell or not a login shell: 

* login shell: (executes more things)
(1) general settings for all users are in: /etc/profile
(2) personal settins may be in: (first one readable)
~/.bash_profile
~/.bash_login
~/.profile
(3) before logout: ~/.bash_logout

* non-login: 
>bash => ~/.bashrc
>bash --norc => nothing
>bash --rcfile filename => specify another rc file

Wednesday, April 11, 2007

Learning C: params from the command line


I want to read two parameters from the command line: a string and a number. The number should be a positive integer smaller than 100.
int main (int argc, char *argv[])
{
char * string;
long int number;
/* are there really 2 parameters? */
if (argc != 3) exit(EXIT_FAILURE);
/* catch them */
string = argv[1];
number = atoi(argv[2]);
/* use them */
printf("String parameter: %s\n",string);
printf("Number parameter+1: %d\n",number+1);
}

The number of parameters (as int) and the parameters itself (as char*) are passed to "main".
Everybody writes main as int main (int argc, char *argv), I guess it's possible to use more descriptive names to the arguments of main, like int main (int argument_counter, char *arguments) but it's probably out of fashion - if you write C you should look serious after all.

The classical beginner's note: It must be considered that argv[0] is the program name (what is useful for example for syntax error messages), so the counter is always actually one more than you intuitively expected and you find the first real parameter as argv[1].

Parameter type: as the parameters are actually always strings, you must convert them using some conversion functions like atoi or strtol (I guess).

About Me

My photo
Hamburg, Hamburg, Germany
Former molecular biologist and web developer (Rails) and currently research scientist in bioinformatics.