Fuller Web Development

  BLOG | CONTACT | CLIENTS

Archive for June 2008

Python Performance Part 2 Redux: Split & Reduce Large Strings for 'A Href' Hypertext

split_2.py
def get_value(a):
return a[1:a.find(">")-1]
hrefs = map(get_value,open(“hypertext.html”,”r”).read().split(“<a href=”))

Timing Comparison: ~ 300% Performance Improvement
Note: hypertext.html is 48MB.
braydon@bgf:~/python_tests/extract$ time python split.py

real 0m1.263s
user 0m1.112s
sys 0m0.156s

braydon@bgf:~/python_tests/extract$ time python split_2.py

real 0m0.392s
user 0m0.268s
sys 0m0.120s

split.py
Previously, I had found [...]

Python Performance Part 2: Parsing Large Strings for 'A Href' Hypertext

Goal
Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”].
Attempt 1: Self-Recursion
f = open(‘hypertext_sm.html’,'r’)
ahrefs = []
count = []
def find_ahref(h):
a = h.find(“<a href=”)
if a != -1:
[...]

Python Performance Part 1: Transforming Large Lists into Seperate Smaller Lists

Goal
Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]].
Attempt 1: Map/Reduce (0.93s)
#import a list of 247,213 integers
from oids import oids

def pre(a):
return (list(), a, 0)

def make_sets(a,b):
set_size = 8
[...]