Monday, June 29. 2009
Some observations on using the search APIs for the 3 major search engines, Google, Bing and Yahoo. By far the best I've found is Bing, followed by Yahoo, and lastly Google.
To perform a search for Ireland:
For Bing and Yahoo you need to sign up for an API key. It only takes 2 minutes. Google doesnt need an API key. All can return JSON formatted search results (also XML), however each has a proprietary format. All search engines have removed limits on the number of queries you can submit. Bing and Yahoo dont place limits on the number of results that can be returned from a single query, however Google limits you to 64 results for a general search (other searches are more limited). The 'rsz' parameter for Google can be small (4 results) or large (8 results). To retrieve more results for all you can apply an offset, which is the last parameter for each. Overall the search APIs have improved massively over the past year. All thats really missing is a unified search syntax and result set.
Tuesday, June 16. 2009
Using data I collected from Reddit for RedditTrends I found some interesting spikes in the number of votes submitted articles receive.

This graph shows the sum total of UP votes received per day by submitted links.

This graph shows the sum total of DOWN votes received per day by submitted links.
As you can see, the graphs are normally quite static, but have huge spikes every now and again, which are massively out of kilter with the normal every day average.

This graph shows the total score achieved which should be = UP votes - DOWN votes. On one positive note, users of Reddit are twice as likely to UP vote a link than to DOWN vote a link.

When this is compared to the total number of comments its clear to see that the spike around January is a natural one. It seems to correspond with Obamas election and inauguration.
The huge spike on 24th of April was effectively a revolt on Reddit over the number of duplicate stories. Dozens of links got vast numbers of UP and DOWN votes, mostly with the same title.
Sunday, May 3. 2009
As an extension to PredictReddit I've created RedditTrends.com. Its kind of like Google trends except for Reddit submissions. Its just an early work in progress but looks interesting and produces pretty graphs.
Pandemic
Obama
Mexico
Graphs are produced using flot (jQuery). The backend runs the Zend framework (PHP) and MySQL.
Saturday, May 2. 2009
On Reddit there are a huge number of links submitted, however few ever get enough votes to make it to the front page of the site where most Reddit users will see it. A submission depends greatly on the title, however you only have one shot to get it right.
This is where PredictReddit comes in. It allows you to test out your proposed title, giving you an estimate of the number of votes it is likely to get. So you can fine tune it before you submit it to Reddit. It uses past submissions to predict future votes. This of course assumes that the Reddit community is interested in similar recurring topics (and it seems to be).
There can be some confusion over the results it gives back. For example, if you type in a title that you know got a high number of votes and you get a low number of votes back. You might assume that PredictReddit is broken, however in reality, if a story is very successful, often you find numerous other submissions trying to piggyback on its success (but they fail to get many votes). This gives a low predicted number of votes. Best to play around with it yourself and try it out. And remember its just for fun.
It works by using a k-Nearest Neighbours algorithm. It was written in PHP using the Zend Framework. It uses MySQL for data storage. Data is pulled from Reddit using their json interface.
Thursday, April 16. 2009
For those of you who are interested, here's my PhD thesis:
Adaptive Scheduling in Heterogeneous Distributed Computing Systems.
Abstract
The main focus of this research is in the area of adaptive scheduling for heterogeneous
distributed systems. Given an unreliable, non-dedicated set of
processing and communication resources, a scheduler is required to allocate
tasks to processors. No information about the state of the system, which
can vary over time, or the tasks to be processed, is known in advance and
thus must be estimated dynamically. Current schedulers do not adequately
address this dynamism. To address this, a property estimation method is
presented, which utilizes a k-Nearest Neighbours algorithm, a smoothed average
and an analytical benchmark. These estimated properties are then
used by two different scheduling techniques, which make less restrictive assumptions
than the current state-of-the-art methods. A multi-heuristic evolutionary
method utilizes a genetic algorithm and eight simple heuristics to
efficiently allocate tasks to processors. A deterministic method utilizes the
error inherent in estimating the properties of the system and the execution
time of tasks, to allocate tasks to processors. The algorithms have been
implemented on a real-world heterogeneous distributed system with up to
150 processors. A set of real-world problems from the areas of cryptography,
bioinformatics, and biomedical engineering were used as a test set to measure
the effectiveness of the scheduling algorithms. Experiments have shown that
both methods achieve better efficiency than other state-of-the-art heuristic
algorithms. Finally, a low memory distributed reconstruction application for
large digital holograms is presented, which has significantly increased the size
of holograms that can be reconstructed, over the previous state-of-the-art.
Thursday, October 30. 2008
Journal Publications
- Lukas Ahrenberg, Andrew J. Page, Bryan M. Hennelly, John B. McDonald, and Thomas J. Naughton, Using commodity graphics hardware for real-time digital hologram view-reconstruction,Journal of Display Technology, vol. 5, no. 1, 2009.
- Andrew J. Page, Thomas M. Keane and Thomas J. Naughton, Scheduling in a dynamic heterogeneous distributed system using estimation error, Journal of Parallel and Distributed Computing, Volume 68, Issue 11, November 2008, 1452-1462. DOI
- Andrew J. Page, Lukas Ahrenberg, Thomas J. Naughton, Low memory distributed reconstruction of large digital holograms, Optics Express 16, 1990-1995 (2008).
- Thomas M. Keane, Andrew J. Page, Thomas J. Naughton, S.A.A. Travers, J.O. McInerney, Building large phylogenetic trees on coarse-grained parallel machines, Algorithmica, Springer, vol. 45, no. 3, pp. 285-300, July 2006.
- Andrew J. Page, Thomas J. Naughton, Framework for task scheduling in heterogeneous distributed computing using genetic algorithms, Artificial Intelligence Review, Volume 24, Numbers 3-4, November 2005, Pages: 415 - 429, Springer.
Peer Reviewed Conference Papers
- Andrew J. Page, Shirley Coyle, Thomas M. Keane, Thomas J. Naughton, Charles Markham and Tomas Ward, Distributed Monte Carlo Simulation of Light Transportation in Tissue, 8th International Workshop on Java for Parallel and Distributed Computing, proceedings of the 20th International Parallel &Distributed Processing Symposium, Rhodes, Greece, April 2006. IEEE Computer Society.
- Thomas M. Keane, Andrew J. Page, James O. McInerney, Thomas J. Naughton, A high-throughput bioinformatics distributed computing platform, Bioinformatics and its Medical Applications Special Track, The 18th IEEE International Symposium on Computer-Based Medical Systems, pp. 377-382, Dublin, Ireland, June 2005.
- Andrew J. Page, Thomas J. Naughton, Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing, 8th International Workshop on Nature Inspired Distributed Computing, proceedings of the 19th International Parallel & Distributed Processing Symposium, pp. 189a.1-189a.8, Denver, Colorado, USA, April 2005. IEEE Computer Society. Bibtex and abstract
- Andrew J. Page, Thomas M. Keane, Thomas J. Naughton,Bioinfomatics on a Heterogeneous Java Distributed System,7th International Workshop on Java for Parallel and Distributed Computing, proceedings of the 19th International Parallel & Distributed Processing Symposium, pp. 184a.1-184a.4, Denver, Colorado, USA, April 2005. IEEE Computer Society.Bibtex and abstract
- Andrew J. Page, Thomas J. Naughton,Framework for task scheduling in heterogeneous distributed computing using genetic algorithms, 15th Artificial Intelligence and Cognitive Science Conference, eds. Lorraine McGinty and Brian Crean, pp. 137-146,September 8th - 10th 2004, Castlebar, Ireland. ISBN 1-902277-89-9.Bibtex and abstract
- Andrew J. Page, Thomas Keane, Thomas J. Naughton,Adaptive Scheduling Across a Distributed Computation Platform,Third International Symposium on Parallel and Distributed Computing, ed. John P. Morrisson, pp. 141-149,July 2004, Cork, Ireland. ISBN 0-7695-2210-6, IEEE Computer Society.Bibtex and abstract
- Andrew Page, Thomas Keane, Richard Allen, Thomas J. Naughton, John Waldron,Multi-tiered distributed computing platform,2nd International Conference on the Principles and Practice of Programming in Java, pp. 191-194,Kilkenny City, Ireland, June 2003. ISBN 0-9544145-1-9. Bibtex and abstract
- Thomas M. Keane, Andrew Page, Thomas J. Naughton, Simon A.A. Travers,James O. McInerney, Grace P. McCormack, "Heterogeneous distributed computing," IFIP Working Group 8.6 Conference on IT Innovation for Adaptability and Competitiveness, Leixlip, Ireland, 30 May - 2 June 2004.
Wednesday, July 9. 2008
I've updated Baruchs LaTex thesis template. Enjoy.
Tuesday, May 20. 2008
Opera Mini - Easy to use web browser. It should be the first thing you install.
Fring - Universal instant messenger and VOIP application. It works with MSN, Gtalk, Yahoo, Skype, Twitter etc.... Its very easy to use, and allows you to stay connected all the time. This alone is a killer app.
Gmail - If you use Gmail, get the mobile application. Its fast and neat. You cant send attachments however.
Google Maps - Get satellite photos and maps on your phone. Can pinpoint your approximate location (to the nearest cell tower).
Train timetables - Stripped down interface to the Irish Rail website, without all the bloat.
Wednesday, November 14. 2007
The logical operator for NOT Equals in Matlab is ~= rather than !=
So not is ~ in matlab, compared to ! in Java/C/Perl etc....
Tuesday, November 13. 2007
Bigulo has performed more than 5 million searches since Sept 2006 (14 months). Its still going strong, with a constantly high level of traffic (and very low bounce rate). On average each visitor looks at 5 pages per visit.
Wednesday, October 24. 2007
I googled for something from Politics.ie today and came across a Chinese domain leecher. They had setup http://poitics-ie.hostsoft.us (now blocked) and were redirecting their DNS to the Politics.ie server. Thus they were leeching off our pagerank (and content) and succeeded in getting 1200 URLs into the Google index (with our content). They then either sell the domain (with a temporarily high pagerank) or replace the pages with ads and steal some of our referrals. Anyone using that domain now gets a 403 error.
Thursday, September 27. 2007
View Larger Map
We went for a hike last weekend, its great what can be done with gps
Wednesday, September 19. 2007
In Linux, this command will find all .tex files in your home dir, tar and gzip them, then save them to a /backup folder. The name of the file is the current time. Just pop it into cron, and away you go, automatic daily backups
tar -zcf /home/andrew/backup/`date +%s`.tx.tgz `locate ".tex" | grep andrew` >/dev/null 2>&1
Wednesday, August 29. 2007
Heres a bit of useless but interesting info for you. This complex, which looks very nondescript from the ground, is where Ireland prints and distributes money. From Google maps you can see the unusual roads around it and pillboxes (presumably for machine guns). Every now and again you'll see heavily armed convoys of ordinary articulated trucks heading into it.
View Larger Map
Saturday, August 4. 2007
The housing market appears to be going down the tubes, but its difficult to see that that is the case. So I decided to visualise all of the houses currently for sale in Athy, Co.Kildare on a timeline, based on when they were first put up for sale on daft.ie.
I picked Athy because its advertised as 1 hour from Dublin (well the border of Co.Dublin), and since its so far out its very at risk in an unstable housing market. And as you can see, theres loads and loads of houses for sale for an extended period of time.
Timeline of houses for sale in Athy

I've only tested it in Firefox, so if your using IE, tough luck.
|