<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Fuller Web Development &#187; Python</title>
	<atom:link href="http://braydon.com/blog/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://braydon.com/blog</link>
	<description>JavaScript, PHP, and Python Web Development by Braydon Fuller</description>
	<lastBuildDate>Fri, 06 Aug 2010 08:05:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Ticket Lottery Application for Mochilla</title>
		<link>http://braydon.com/blog/2010/06/ticket-email-application-for-mochilla/</link>
		<comments>http://braydon.com/blog/2010/06/ticket-email-application-for-mochilla/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 06:59:57 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[Eagle Rock]]></category>
		<category><![CDATA[Mochilla]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=266</guid>
		<description><![CDATA[I just finished and launched a &#8220;razor blade&#8221; application for Mochilla to randomly choose subscribers to win tickets for the VTech/Mochilla Arthur Verocai Show this next Friday the 18th. It was a quick turn around of three days, and is an integration with their existing software for their website, more details about that to follow. [...]]]></description>
			<content:encoded><![CDATA[<p>I just finished and launched a &#8220;razor blade&#8221; application for Mochilla to randomly choose subscribers to win tickets for the VTech/Mochilla Arthur Verocai Show this next Friday the 18th. It was a quick turn around of three days, and is an integration with their existing software for their website, more details about that to follow. <a href="http://mochilla.com/lottery">Please go sign-up for a chance to win a pair of tickets!</a>. Your chance will be over in 24 hours.</p>
<p><a href="http://mochilla.com/lottery"><img src="http://braydon.com/blog/wp-content/uploads/2010/06/2238-e1276232151630.jpg" alt="" title="2238" width="500" height="796" class="alignnone size-full wp-image-265" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2010/06/ticket-email-application-for-mochilla/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Events Section on Mochilla.com</title>
		<link>http://braydon.com/blog/2010/03/new-events-section-on-mochilla-com/</link>
		<comments>http://braydon.com/blog/2010/03/new-events-section-on-mochilla-com/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 02:29:02 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[Eagle Rock]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Mochilla]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=152</guid>
		<description><![CDATA[We just got up a new events section for Mochilla.com. I added a &#8216;date&#8217; column to the existing documents mysql table and an interface to be able to add/edit events via the browser. It took about a day and half to complete.
]]></description>
			<content:encoded><![CDATA[<p>We just got up a new events section for <a href="http://mochilla.com/events">Mochilla.com</a>. I added a &#8216;date&#8217; column to the existing documents mysql table and an interface to be able to add/edit events via the browser. It took about a day and half to complete.</p>
<div id="attachment_153" class="wp-caption alignnone" style="width: 510px"><a title="Events at Mochilla.com" href="http://mochilla.com/events"><img class="size-full wp-image-153" title="Mochilla Events Page" src="http://braydon.com/blog/wp-content/uploads/2010/03/Screenshot-e1268187969732.png" alt="" width="500" height="297" /></a><p class="wp-caption-text">Screenshot of the new Mochilla.com events page.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2010/03/new-events-section-on-mochilla-com/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Memory Optimizations for Mochilla.com</title>
		<link>http://braydon.com/blog/2010/03/memory-optimizations-for-mochilla-com/</link>
		<comments>http://braydon.com/blog/2010/03/memory-optimizations-for-mochilla-com/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 02:10:25 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Eagle Rock]]></category>
		<category><![CDATA[Mochilla]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=118</guid>
		<description><![CDATA[I just spent the last two days optimizing the Python backend for the Mochilla.com website. We are running the website on a 256MB VPS at Slicehost. They have been getting a larger number of hits the last week because of their new Timeless video they are releasing, and several people have been embedding the videos [...]]]></description>
			<content:encoded><![CDATA[<p>I just spent the last two days optimizing the Python backend for the Mochilla.com website. We are running the website on a 256MB VPS at Slicehost. They have been getting a larger number of hits the last week because of their new Timeless video they are releasing, and several people have been embedding the videos on their websites. All this was causing the server to lockup and need to be restarted a couple of times a day. So the initial outdated code that was running the site before has been trimmed down and there has been a dramatic increase in speed on the website and a reduction of the amount of memory it requires. I&#8217;ve done away with using classes for various objects, and the data is stored in a dictionary instead. I&#8217;m also using straight SQL to get data from the database rather than using an expression language or an ORM that would just add another level of complexity. I&#8217;ve also consolidated various tables that really should be one as all the information is being queried at once anyways. It&#8217;s a small but large move, that has made the website much lighter weight and easier to manage.</p>
<p>Browsing is now faster and the server is more stable with higher amounts of traffic, take a look: <a href="http://mochilla.com">Mochilla.com</a></p>
<p>The database now has this schema:</p>
<pre><cod>
mysql> show tables;
+--------------------+
| Tables_in_mochilla |
+--------------------+
| documents          |
| templates          |
| uri                |
+--------------------+
3 rows in set (0.06 sec)

mysql> describe documents;
+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| oid        | int(11)       | NO   | PRI | NULL    | auto_increment |
| modified   | datetime      | YES  |     | NULL    |                |
| created    | datetime      | YES  |     | NULL    |                |
| parent_oid | int(11)       | YES  |     | NULL    |                |
| html       | varchar(1000) | YES  |     | NULL    |                |
| image      | varchar(1000) | YES  |     | NULL    |                |
| video      | varchar(1000) | YES  |     | NULL    |                |
| audio      | varchar(1000) | YES  |     | NULL    |                |
| uri        | varchar(100)  | YES  |     | NULL    |                |
| name       | varchar(100)  | YES  |     | NULL    |                |
| weight     | int(11)       | YES  |     | NULL    |                |
+------------+---------------+------+-----+---------+----------------+
11 rows in set (0.46 sec)

mysql> describe templates;
+----------+---------------+------+-----+---------+----------------+
| Field    | Type          | Null | Key | Default | Extra          |
+----------+---------------+------+-----+---------+----------------+
| vid      | int(11)       | NO   | PRI | NULL    | auto_increment |
| name     | varchar(100)  | YES  |     | NULL    |                |
| source   | varchar(255)  | YES  |     | NULL    |                |
| parent   | int(11)       | YES  |     | NULL    |                |
| children | varchar(1000) | YES  |     | NULL    |                |
+----------+---------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> describe uri;
+----------+---------------+------+-----+---------+-------+
| Field    | Type          | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+-------+
| location | varchar(1000) | NO   | PRI | NULL    |       |
| oid      | int(11)       | YES  |     | NULL    |       |
| vid      | int(11)       | YES  | MUL | NULL    |       |
| children | varchar(1000) | YES  |     | NULL    |       |
+----------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec

</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2010/03/memory-optimizations-for-mochilla-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mochilla Website Update</title>
		<link>http://braydon.com/blog/2010/02/mochilla-website-update/</link>
		<comments>http://braydon.com/blog/2010/02/mochilla-website-update/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 04:50:05 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Websites]]></category>
		<category><![CDATA[Eagle Rock]]></category>
		<category><![CDATA[Mochilla]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=50</guid>
		<description><![CDATA[We just got up a new landing page, and product page for the Timeless Concert Series new Box-Set to be released soon. It was a quick turn-around of about 2 days from initial call to having the new page up. This work included, setting up a development site, modifying an already existing template file, and [...]]]></description>
			<content:encoded><![CDATA[<p>We just got up a new landing page, and product page for the Timeless Concert Series new Box-Set to be released soon. It was a quick turn-around of about 2 days from initial call to having the new page up. This work included, setting up a development site, modifying an already existing template file, and style sheets, and making a graphic for the homepage.</p>
<p><a href="http://mochilla.com">MOCHILLA.COM</a></p>
<div id="attachment_51" class="wp-caption alignnone" style="width: 510px"><a href="http://mochilla.com"><img class="size-full wp-image-51 " title="Mochilla Landnig Page" src="http://braydon.com/blog/wp-content/uploads/2010/02/Screenshot-e1266641284216.png" alt="" width="500" height="352" /></a><p class="wp-caption-text">View of the new Mochilla.com landing page.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2010/02/mochilla-website-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zine Plugins</title>
		<link>http://braydon.com/blog/2009/05/zine-plugins/</link>
		<comments>http://braydon.com/blog/2009/05/zine-plugins/#comments</comments>
		<pubDate>Wed, 20 May 2009 04:19:12 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Plugins]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Silverlake]]></category>
		<category><![CDATA[Zine]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=34</guid>
		<description><![CDATA[
Title: Zine Featured Posts Plugin
Client: DUBLAB
Date: February 2009
Languages: Python and JavaScript
Description: A plugin for Zine to be able to select posts to feature and give them order.

]]></description>
			<content:encoded><![CDATA[<div id="attachment_19" class="wp-caption alignnone" style="width: 510px"><a href="http://zine.pocoo.org/"><img class="size-full wp-image-19" title="Zine Plugins" src="http://braydon.com/blog/wp-content/uploads/2010/02/featured_posts.png" alt="" width="500" height="498" /></a><p class="wp-caption-text">Featuring and changing the order of Posts in Zine</p></div>
<ul>
<li>Title: <a href="http://zine.pocoo.org/">Zine</a> Featured Posts Plugin</li>
<li>Client: <a href="http://dublab.com/">DUBLAB</a></li>
<li>Date: February 2009</li>
<li>Languages: <strong>Python</strong> and <strong>JavaScript</strong></li>
<li>Description: A plugin for Zine to be able to select posts to feature and give them order.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2009/05/zine-plugins/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 3: Python 3000 and Transforming Large Lists into Seperate Smaller Lists</title>
		<link>http://braydon.com/blog/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/</link>
		<comments>http://braydon.com/blog/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 13:30:00 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=125</guid>
		<description><![CDATA[
Preface
This is a redux of Python Performance Part 1, where the fastest method was using the reduce builtin function in Python2.5. December 3rd, Python 3000 final was released so I have downloaded it and gone over some of these scripts again. In Python 3000 the reduce function is no longer a builtin, and has moved [...]]]></description>
			<content:encoded><![CDATA[<div class="text" id="extended">
<h2>Preface</h2>
<p>This is a redux of <a href="/blog/2009/2/11/python-performance-part-1">Python Performance Part 1</a>, where the fastest method was using the reduce builtin function in Python2.5. December 3rd, Python 3000 final was released so I have downloaded it and gone over some of these scripts again. In Python 3000 the reduce function is no longer a builtin, and has moved to the module functools. When doing some general comparisons between Python2.5 and Python 3000, the later seemed to always run slightly slower. This is due to the new IO system and unicode indentifiers, as I was told in #python channel by Crys_. It was also recommended that I also compare my tests with <a href="http://en.wikipedia.org/wiki/List_comprehension">List Comprehension</a>, of which is new to me.</p>
<h2>List Comprehension</h2>
<pre>from oids import oids as a
c = 3
res = [a[x:x+c] for x in [c*x for x in range(int(round(len(a)/c)))]]
</pre>
<h3>Python 3k Times</h3>
<pre>real  0m0.218s
user  0m0.180s
sys   0m0.016s

real  0m0.262s
user  0m0.204s
sys   0m0.032s

real  0m0.287s
user  0m0.244s
sys   0m0.008s
</pre>
<h3>Python 2.5 Times</h3>
<pre>real  0m0.244s
user  0m0.220s
sys   0m0.016s

real  0m0.229s
user  0m0.208s
sys   0m0.020s

real  0m0.251s
user  0m0.236s
sys   0m0.020s
</pre>
<p>
This makes it the fastest method, beating the previous fastest time of 0.34s. Even better, there is little difference between Python2.5 and Python3000 here.
</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Free Art Exhibition Website</title>
		<link>http://braydon.com/blog/2008/08/free-art-exhibition-website/</link>
		<comments>http://braydon.com/blog/2008/08/free-art-exhibition-website/#comments</comments>
		<pubDate>Fri, 08 Aug 2008 00:00:44 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Websites]]></category>
		<category><![CDATA[Creative Commons]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Silverlake]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=29</guid>
		<description><![CDATA[INTOINFINITY.ORG

Title: Into Infinity
Client: Creative Commons &#38; DUBLAB
Date: August 2008
Languages: Python and JavaScript
Description: An online exhibition of audio and visual artwork with many permutations.

]]></description>
			<content:encoded><![CDATA[<p><a href="http://intoinfinity.org">INTOINFINITY.ORG</a></p>
<div id="attachment_18" class="wp-caption alignnone" style="width: 510px"><a href="http://intoinfinity.org"><img class="size-full wp-image-18" title="Into Infinity" src="http://braydon.com/blog/wp-content/uploads/2010/02/into_infinity.png" alt="" width="500" height="525" /></a><p class="wp-caption-text">Administration of artworks via a JSON text file using Emacs through SSH</p></div>
<ul>
<li>Title: <a href="http://intoinfinity.org/">Into Infinity</a></li>
<li>Client: <a href="http://creativecommons.org/">Creative Commons</a> &amp; <a href="http://dublab.com/">DUBLAB</a></li>
<li>Date: August 2008</li>
<li>Languages: <strong>Python</strong> and <strong>JavaScript</strong></li>
<li>Description: An online exhibition of audio and visual artwork with many permutations.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2008/08/free-art-exhibition-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 2 Redux: Split &amp; Reduce Large Strings for &#039;A Href&#039; Hypertext</title>
		<link>http://braydon.com/blog/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/</link>
		<comments>http://braydon.com/blog/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 13:37:21 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=129</guid>
		<description><![CDATA[
split_2.py
def get_value(a):
    return a[1:a.find("&#62;")-1]
hrefs = map(get_value,open("hypertext.html","r").read().split("&#60;a href="))

Timing Comparison: ~ 300% Performance Improvement
Note: hypertext.html is 48MB.
braydon@bgf:~/python_tests/extract$ time python split.py 

real    0m1.263s
user    0m1.112s
sys     0m0.156s

braydon@bgf:~/python_tests/extract$ time python split_2.py 

real    0m0.392s
user    0m0.268s
sys     0m0.120s

split.py
Previously, I had found [...]]]></description>
			<content:encoded><![CDATA[<div class="text" id="extended">
<h2>split_2.py</h2>
<pre>def get_value(a):
    return a[1:a.find("&gt;")-1]
hrefs = map(get_value,open("hypertext.html","r").read().split("&lt;a href="))
</pre>
<h2>Timing Comparison: ~ 300% Performance Improvement</h2>
<p>Note: hypertext.html is 48MB.</p>
<pre>braydon@bgf:~/python_tests/extract$ time python split.py 

real    0m1.263s
user    0m1.112s
sys     0m0.156s

braydon@bgf:~/python_tests/extract$ time python split_2.py 

real    0m0.392s
user    0m0.268s
sys     0m0.120s
</pre>
<h2>split.py</h2>
<p>Previously, I had found the best solution to my problem was to split() the large string up by the &#8220;&gt;&#8221; character, and then reduce to a list of hyperlinks. </p>
<pre>def is_ahref(a,b):
    y = b.find("&lt;a href=")
    if y != -1: a.append(b[y+9:-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

hrefs = preduce(is_ahref,open("hypertext.html","r").read().split("&gt;"),[])
</pre>
<p>There is a better solution. Splitting the text up by the &#8220;&gt;&#8221; character is wasteful; there are many &#8220;&gt;&#8221;s in html, and most of them that will not have hyperlinks. We don&#8217;t need to even check if the item in the list is an href if we split the string into a list that all will have an href, and then reduce it as before. </p>
<pre>def is_ahref(a,b):
    z = b.find("&gt;")
    a.append(b[1:z-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

fc = open("hypertext.html","r").read().split("&lt;a href=")
hrefs = preduce(is_ahref,fc,[])
</pre>
<p>However because the size of the list will be exactly the same as it started, we shouldn&#8217;t need to use reduce(), but rather we can just map() a fuction to run through the entire list, &#8216;reducing&#8217; it to a list of just the hyperlinks.</p>
<h2>split_2.py</h2>
<pre>def get_value(a):
    return a[1:a.find("&gt;")-1]
hrefs = map(get_value,open("hypertext.html","r").read().split("&lt;a href="))
</pre>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 2: Parsing Large Strings for &#039;A Href&#039; Hypertext</title>
		<link>http://braydon.com/blog/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/</link>
		<comments>http://braydon.com/blog/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 13:39:46 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=133</guid>
		<description><![CDATA[
Goal
Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”].
Attempt 1: Self-Recursion 
f = open('hypertext_sm.html','r')
ahrefs = []
count = []
def find_ahref(h):
    a = h.find("&#60;a href=")
    if a != -1:
   [...]]]></description>
			<content:encoded><![CDATA[<div class="text" id="extended">
<h2>Goal</h2>
<p>Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”].</p>
<h2>Attempt 1: Self-Recursion </h2>
<pre>f = open('hypertext_sm.html','r')
ahrefs = []
count = []
def find_ahref(h):
    a = h.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a + h[a:-1].find("&gt;")-1
        ahrefs.append(h[a:b])
        find_ahref(h[b:-1])

find_ahref(f.read())
f.close()
</pre>
<h3>Summary</h3>
<p>Fairly fast with small strings, however with large strings it causes a memory overload from the large string being stored multiple times from the self-recursion, causing the script to fail.</p>
<h2>Attempt 2: Reduce</h2>
<pre>f = open('hypertext_sm.html','r')
def find_ahref(a,b):
    try:
        c = a[1] + str(b)
        x = c.find("&lt;a href=")
        y = c[x:-1].find("&gt;")
        if x != -1 and y != -1:
            a[0].append(c[x+9:x+y-1])
            return (a[0],"")
        else:
            return (a[0],c)
    except:
        return ([],str(a)+str(b))

hrefs = reduce(find_ahref,f.read())[0]
f.close()
</pre>
<h3>Summary</h3>
<p>Not as fast as the previous with smaller strings, however it does not overload the memory and it atually completed parsing the larger string (43Mb). Because it took nearly 7min to run though it is difficult for in to be a solution.</p>
<h2>Attempt: 3: While Readline</h2>
<pre>
f = open('hypertext.html','r')

ahrefs = []
def find_ahref(h):
    a = h.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a + h[a:-1].find("&gt;")-1
        ahrefs.append(h[a:b])
        find_ahref(h[b:-1])

while True:
    line = f.readline()
    if line:
        find_ahref(line)
    else:
        break

f.close()
</pre>
<h3>Summary</h3>
<p>This is pretty fast with both small and large strings (with many lines). However if it was handed a very large single line it would crumble as Attempt 1: Self-Recursion.</p>
<h2>Attempt 4: Map/Reducer Readlines</h2>
<pre>f = open('hypertext.html','r')
def ahref_reducer(a,b):
    try:
        c = a[1] + str(b)
        x = c.find("&lt;a href=")
        y = c[x:-1].find("&gt;")
        if x != -1 and y != -1:
            a[0].append(c[x+9:x+y-1])
            return (a[0],"")
        else:
            return (a[0],c)
    except:
        return ([],str(a)+str(b))

def get_ahrefs(line):
    return reduce(ahref_reducer,line)[0]

lines = f.readlines()
hrefs = map(get_ahrefs,lines)
f.close()
</pre>
<h3>Summary</h3>
<p>Slower but doesn’t destroy memory. However, it doesn’t really meet the goal either as it returns a nasty list with empty parts.</p>
<h2>Attempt 5: Reduce Recurse Readlines</h2>
<pre>f = open('hypertext.html','r')

def find_ahref(z,htxt):
    a = htxt.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a+htxt[a:-1].find("&gt;")-1
        href = htxt[a:b]
        z.append(href)
        return find_ahref(z,htxt[b:-1])
    else:
        return z

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

lines = f.readlines()
hrefs = preduce(find_ahref,lines,[])

f.close()
</pre>
<h3>Summary</h3>
<p>Fast, although it relies on their being many multiple lines.</p>
<h2>Attempt 6: Something Completly Different</h2>
<p>Instead of looking for the “a href” first we will look for the end “&gt;” and search for the “a href” to that point.</p>
<pre>hrefs = []

f = open("hypertext.html","r")
f_str = f.read()
f.close()

while True:
    b = f_str.find("&gt;")
    if b == -1:
        break
    a = f_str[0:b].find("&lt;a href=")
    if a != -1:
        hrefs.append(f_str[a+9:b-1])
    f_str = f_str[b+1:-1]
</pre>
<h3>Summary</h3>
<p>While good in theory, it does not handle large strings well; and by well I mean not at all. However it did get the correct hrefs from a smaller list.</p>
<h2>Attempt 7: Split &amp; (P)Reduce</h2>
<p>Rather than breaking the large string up by line, we will break it up by a character “&gt;”.</p>
<pre>def is_ahref(a,b):
    y = b.find("&lt;a href=")
    if y != -1: a.append(b[y+9:-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

hrefs = preduce(is_ahref,open("hypertext.html","r").read().split("&gt;"),[])
</pre>
<h3>Summary</h3>
<p>This is the fastest of them all, slightly faster than Reduce Recurse Readlines, and can even handle large single string lines quickly.</p>
<h2>Conclusion</h2>
<ul>
<li>Self-recursion is not good when passing around the same string to itself.
</li>
<li>Concatenating a long list of charaters from a string and checking for “a href” does not destroy memory but is very slow.
</li>
<li>Breaking a large string into smaller ones is faster and deosn’t destroy memory.
</li>
<li>Reading by line is only one way to make a large string (or file) into smaller strings.
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 1: Transforming Large Lists into Seperate Smaller Lists</title>
		<link>http://braydon.com/blog/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/</link>
		<comments>http://braydon.com/blog/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 13:39:26 +0000</pubDate>
		<dc:creator>Braydon Fuller</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://braydon.com/blog/?p=138</guid>
		<description><![CDATA[
Goal
Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]].
Attempt 1: Map/Reduce (0.93s)
#import a list of 247,213 integers
from oids import oids

def pre(a):
    return (list(), a, 0)

def make_sets(a,b):
    set_size = 8
 [...]]]></description>
			<content:encoded><![CDATA[<div class="text" id="extended">
<h2>Goal</h2>
<p>Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]].</p>
<h2>Attempt 1: Map/Reduce (0.93s)</h2>
<pre>#import a list of 247,213 integers
from oids import oids

def pre(a):
    return (list(), a, 0)

def make_sets(a,b):
    set_size = 8
    if a[2] == set_size - 1 or a[0] == list():
        a[0].append([a[1]])
        return (a[0],b[1],0)
    else:
        a[0][-1].append(a[1])
        return (a[0],b[1],a[2]+1)

reduce(make_sets,map(pre,oids))
</pre>
<h3>Times</h3>
<pre>real    0m0.935s
user    0m0.896s
sys     0m0.032s

real    0m0.948s
user    0m0.920s
sys     0m0.028s

real    0m0.929s
user    0m0.900s
sys     0m0.024s
</pre>
<h2>Attempt 2: For-loop (0.41s)</h2>
<pre>
from oids import oids

output = list()
count = 0
set_size = 8
for oid in oids:
    if count == set_size or output == list():
        output.append([oid])
        count = 0
    else:
        output[-1].append(oid)
        count = count + 1
</pre>
<h3>Times</h3>
<pre>real    0m0.429s
user    0m0.404s
sys     0m0.024s

real    0m0.396s
user    0m0.384s
sys     0m0.012s

real    0m0.410s
user    0m0.396s
sys     0m0.012s
</pre>
<h2>Attempt 3: Map (0.48s)</h2>
<pre>from oids import oids

output = list()
set_size = 8
count = [0]

def break_apart(a):
    if count[-1] == set_size or output == list():
        output.append([a])
        count.append(0)
    else:
        output[-1].append(a)
        count.append(count[-1] + 1)

map(break_apart,oids)
</pre>
<h3>Timing</h3>
<pre>real    0m0.484s
user    0m0.476s
sys     0m0.012s

real    0m0.483s
user    0m0.464s
sys     0m0.016s

real    0m0.482s
user    0m0.452s
sys     0m0.028s
</pre>
<h2>Attempt 4: Reduce (0.34s)</h2>
<pre>from oids import oids

def seperate(a,b,length=8):
    try:
        if len(a[-1]) == length:
            a.append([b])
            return a
        else:
            a[-1].append(b)
            return a
    except:
        return [[a,b]]

oids = reduce(seperate,oids)
</pre>
<h3>Timing</h3>
<pre>real    0m0.323s
user    0m0.308s
sys     0m0.016s

real    0m0.329s
user    0m0.300s
sys     0m0.028s

real    0m0.353s
user    0m0.332s
sys     0m0.020s
</pre>
]]></content:encoded>
			<wfw:commentRss>http://braydon.com/blog/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
