Archive for July, 2007

tdroza

Playing with Python

For something I was playing around with at work, I wanted to be able to retrieve an rss feed, parse it and post the title/description fields to another website site at timed intervals. These days I only really write in Java and JavaScript but Java seemed like such a longhand way to achieve this. I probably could have written a shell script, but it’s such a long time since I wrote shell scripts that I’d have been starting from scratch so I decided to take a look at Python… and so far I’m impressed. Very impressed.

From start to finish this probably only took a couple of hours and that includes referring back to the api docs for almost every line I wrote. The code below fetches a feed, and extracts the title field from the items. Each time it finds an item, it adds its guid to a text file so that it can ignore items that have been previously processed. I’m sure this can be tidied up lots, but for a first attempt I’m pretty happy (for simplicity I’ve removed the code that posts the items

import urllib
from threading import Timer
from xml.dom import minidom

def retrieveXml(url):
    #get the feed
    f = urllib.urlopen(url)
    xmldoc = minidom.parse(f)
    f.close()

    # read the history (assumes the file already exists)
    historyFile = open('./history.dat','r')
    history = historyFile.read()
    historyFile.close()
    found = False
    # iterate through each item in the feed
    items = xmldoc.getElementsByTagName('item')
    for item in items:
        title = item.getElementsByTagName('title')[0].firstChild.toxml()
        guid = item.getElementsByTagName('guid')[0].firstChild.toxml()
        # if the current item isn't in the history, then use that
        if history.find(guid) < 0:
            found = True
            break
    # if we found a new entry while iterating over the feed...
    if found:
        historyFile = open('./history.dat','a')
        historyFile.write(guid  + "n")
        historyFile.close()
        # for now just print the title to screen
        print title
    t = Timer(10.0, retrieveXml, [url])
    t.start()

retrieveXml('http://f1.gpupdate.net/en/xml/rss/1.xml')
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
tdroza

Why have I decided to do this?

Is that poor form to begin a new blog with a question? I hope not, because now that’s two. To answer the first one, I guess there are several reasons. Briefly (and in no particular order):

  • I’ve done (and am doing) lot’s of “stuff” on the web. I think a lot of the current stuff is pretty interesting and I hope others might too but there’s no central place to bring it all together. Hopefully this site can do that.
  • Now just seems like the right time.
  • I’ve sat through 106 episodes of Diggnation and thought I ought to use the GoDaddy offer code.
  • For the last few years, I’ve hosted my own web pages on a hacked NSLU2 running off my home broadband line. I realised that the content I was posting was becoming more blog-like but without all the advantages of a proper blog like (tagging, comments and rss) and that hosting plans are so cheap now that it makes a lot more sense to start blogging properly.

So, here it is, let’s see how it goes. I don’t intend for this blog to have a particular theme so it will be a mixture of all the things I’m into and will probably include social stuff, rants, tech, photos and music.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]