This are my notes in the fields of computer science and technology. Everything is written with ABSOLUTE NO WARRANTY of fitness for any purpose. Of course, feel free to comment anything.

Monday, April 20, 2009

Count distinct values in all columns of a table

The following rake task will show the number of different values of each column of a given table in the database. I wrote it to see statistics on which columns are really used.
namespace :db do
desc "Count distinct values for each non empty column of a table (TABLE=xxx) "+
"optionally using a condition (WHERE=\"yyy\")"
task :count_distincts => :environment do
raise "Specify option TABLE=<table_name>" unless ENV["TABLE"]
puts "Distinct values in table #{ENV['TABLE']}:"
puts "(condition: where #{ENV['WHERE']})" if ENV['WHERE']
c = ActiveRecord::Base.connection
columns = c.select_all("describe #{ENV['TABLE']};").map{|a|a["Field"]}
columns.each do |col|
sql = "select count(distinct #{col}) as c from #{ENV['TABLE']}"
sql << " where #{ENV['WHERE']}" if ENV['WHERE']
n = c.select_one(sql)['c']
puts "#{col}: #{n}" unless n == "0"
STDOUT.flush
end
end
end

About Me

My photo
Hamburg, Hamburg, Germany
Former molecular biologist and web developer (Rails) and currently research scientist in bioinformatics.