namespace :db do
desc "Count distinct values for each non empty column of a table (TABLE=xxx) "+
"optionally using a condition (WHERE=\"yyy\")"
task :count_distincts => :environment do
raise "Specify option TABLE=<table_name>" unless ENV["TABLE"]
puts "Distinct values in table #{ENV['TABLE']}:"
puts "(condition: where #{ENV['WHERE']})" if ENV['WHERE']
c = ActiveRecord::Base.connection
columns = c.select_all("describe #{ENV['TABLE']};").map{|a|a["Field"]}
columns.each do |col|
sql = "select count(distinct #{col}) as c from #{ENV['TABLE']}"
sql << " where #{ENV['WHERE']}" if ENV['WHERE']
n = c.select_one(sql)['c']
puts "#{col}: #{n}" unless n == "0"
STDOUT.flush
end
end
end
This are my notes in the fields of computer science and technology. Everything is written with ABSOLUTE NO WARRANTY of fitness for any purpose. Of course, feel free to comment anything.
Monday, April 20, 2009
Count distinct values in all columns of a table
The following rake task will show the number of different values of each column of a given table in the database. I wrote it to see statistics on which columns are really used.
Subscribe to:
Posts (Atom)
Labels
software installed on my laptop
(13)
windows software
(12)
personal settings
(10)
linux software
(8)
rails
(7)
shell rc file
(6)
linux basics
(5)
C basics
(4)
color the shell
(4)
ruby
(4)
connection to remote servers
(3)
git
(3)
mutimedia software
(3)
rake tasks
(3)
shell
(3)
active scaffold
(2)
backup
(2)
disk analysis
(2)
less
(2)
oracle
(2)
rails migrations
(2)
rails on oracle
(2)
sql
(2)
ssh
(2)
symbolic links
(2)
top open source projects
(2)
virtualbox
(2)
GUI programming
(1)
MIT
(1)
active record
(1)
algorithms
(1)
applications
(1)
apt
(1)
attr_hidden
(1)
bash
(1)
capistrano
(1)
cms tools
(1)
color picker
(1)
common errors
(1)
cooltips
(1)
data recovery
(1)
data structures
(1)
dvd43
(1)
endline chars
(1)
expression engine
(1)
ftp clients
(1)
gimp
(1)
grep
(1)
handbrake
(1)
hard disk failure
(1)
image conversion
(1)
image editing
(1)
inkscape
(1)
java software
(1)
javascript
(1)
jdiskreport
(1)
linux compact guides
(1)
ls
(1)
online lectures
(1)
oracle lob
(1)
photo manipulation
(1)
php
(1)
polymorphic associations
(1)
prototype
(1)
putty
(1)
rails plugins
(1)
rake
(1)
rsync
(1)
ruby blocks
(1)
scanf
(1)
scite
(1)
script
(1)
scripteka
(1)
sftp clients
(1)
sort_by :sql
(1)
suffix array
(1)
synctoy
(1)
tcsh
(1)
timms
(1)
unlocker
(1)
vector graphics
(1)
vi
(1)
video lectures
(1)
vlc media player
(1)
windows alpha blender
(1)
winscp
(1)
xslt xml
(1)
About Me
- Giorgio Gonnella
- Hamburg, Hamburg, Germany
- Former molecular biologist and web developer (Rails) and currently research scientist in bioinformatics.