split_2.py
def get_value(a):
return a[1:a.find(">")-1]
hrefs = map(get_value,open(“hypertext.html”,”r”).read().split(“<a href=”))
Timing Comparison: ~ 300% Performance Improvement
Note: hypertext.html is 48MB.
braydon@bgf:~/python_tests/extract$ time python split.py
real 0m1.263s
user 0m1.112s
sys 0m0.156s
braydon@bgf:~/python_tests/extract$ time python split_2.py
real 0m0.392s
user 0m0.268s
sys 0m0.120s
split.py
Previously, I had found [...]
Goal
Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”].
Attempt 1: Self-Recursion
f = open(‘hypertext_sm.html’,'r’)
ahrefs = []
count = []
def find_ahref(h):
a = h.find(“<a href=”)
if a != -1:
[...]
Goal
Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]].
Attempt 1: Map/Reduce (0.93s)
#import a list of 247,213 integers
from oids import oids
def pre(a):
return (list(), a, 0)
def make_sets(a,b):
set_size = 8
[...]
© Fuller Web Development. Powered by WordPress using the DePo Skinny Theme.