Our Code: Difference between revisions

From APL_wiki
Jump to navigation Jump to search
Wikiuser (talk | contribs)
No edit summary
Wikiuser (talk | contribs)
No edit summary
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
<p>Writing code for a cluster can be a little bit confusing. &nbsp;Unfortunately, there are some extra steps to be taken before a cluster can be taken advantage of. &nbsp;As a reference, I am personally using python 2.7 and mpich2 as my message passing interface. &nbsp;Let's just jump right into the code.</p>
[[Serial and Parallel Pi Calculation]]
<p>First, let's look at what a serial version of a pi approximation looks like.</p>
 
<pre>"""<br />Approximation of pi for comparison with a parallel version<br />"""<br />from __future__ import division<br />from math import factorial as f<br />import sys<br />import time<br />from decimal import *<br />getcontext().prec = 50<br /><br />def piApprox(n):<br />    """<br />    Where n is the number of terms of the summation<br />    to calculate.<br />    """<br />    pi = 0<br />    for i in xrange(n+1):<br />        sign = 1 if i % 2 == 0 else -1<br />        pi += Decimal(sign * 2 * 3 ** (.5-i)) / Decimal(2 * i + 1)<br />    return pi<br /><br /><br />def main():<br />    n = sys.argv[1]<br />    n = int(n)<br />    start = time.time()<br />    pi = piApprox(n)<br />    elapsed = time.time() - start<br />    print("Estimation of {} terms is: {}.".format(n, pi))<br />    print("Elapsed time: {}.".format(elapsed))<br /><br />if __name__ == '__main__':<br />    main()</pre>
[[Decision Forests]]
<p>This is simple enough, maybe you've even written something similar. &nbsp;I'll go over a few things.</p>
<ul>
<li>If you are unfamiliar with python 2.7, the line at the top -- <code>from __future__ import division</code>&nbsp;-- probably looks strange to you. &nbsp;All this is doing is telling python to compute divisions as floats and to not do integer division (that would really mess up our approximation).</li>
<li><code>sys.argv[1]</code> is telling python to take a look at the 1 index of the commandline arguments (where index 0 is the name of the file you are running)<br />
<ul>
<li>For example, we would run this file like</li>
<li><img alt="piserial commandline" height="55" src="/static/media/uploads/blog/mpi4py_Tutorial/piserial_commandline.png" width="695" /></li>
<li>We can see the elapsed time was 3.26 seconds.</li>
</ul>
</li>
<li>Then we just run the <code>piApprox(n)</code> function and add up our <code>n</code> terms.</li>
</ul>
<p>Alright, now let's take a look at the parallel version.</p>
<pre>from __future__ import division<br />import sys<br />import time<br />from mpi4py import MPI<br />from math import factorial as f<br />from decimal import *<br />getcontext().prec = 50<br /><br /><br />def nthTerm(data):<br />    """<br />    Misleading function name.<br />    Computes the sum of terms in data.<br />    """<br />    total = 0<br />    for i in data:<br />        sign = 1 if i % 2 == 0 else -1<br />        total += Decimal(sign * 2 * 3 ** (.5-i)) / Decimal(2 * i + 1)<br />    return total<br /><br /><br />def main():<br />    comm = MPI.COMM_WORLD<br />    rank = comm.Get_rank()<br />    if rank == 0:<br />        start = time.time()<br />        i = int(sys.argv[1])<br />        size = comm.Get_size()<br />        workers = size - 1<br />        termsplits = []<br />        for x in range(workers):<br />            termsplits.append(xrange(x, i+1, workers))<br />        for y, data in enumerate(termsplits):<br />            dest = y + 1<br />            if dest &gt;= size:<br />                pass<br />            else:<br />                comm.send(data, dest=dest)<br />        pi = 0<br />        for y in range(workers):<br />            source = y + 1<br />            print("Waiting for source:{}".format(source))<br />            pi_appr = comm.recv(source=source)<br />            pi += pi_appr<br />            elapsed = time.time() - start<br />            print(pi)<br />            print("MPI pi approx time: {} seconds".format(elapsed))<br />    else:<br />        data = comm.recv(source=0)<br />        pi_appr = nthTerm(data)<br />        comm.send(pi_appr, dest=0)<br /><br /><br />if __name__ == '__main__':<br />    main()</pre>
<p>In many ways, the two versions look quite similar, and they should. &nbsp;They are computing the same thing after all. &nbsp;However, there are a few differences.</p>
<ul>
<li>At the top, we of course have to do a&nbsp;<code>from mpi4py import MPI</code>, so that we can use our cluster.</li>
</ul>
<p>I think that the important thing to think about and realize is that, this exact program is going to run on all of the processors that we tell it to - more or less independent of each other. &nbsp;So unless we explicitly tell two cores to communicate with eachother, they really don't care, or even know what another processor is doing. &nbsp;They are, in fact, completely separate python instances.</p>
<p>In this case, we will need the cores to communicate in order to get a sum of terms. &nbsp;The extra step that we have to do in this version, versus the serial version, is divy up the work for each of our processors.</p>
<ul>
<li>Each process comes with a&nbsp;<code>rank</code> which we can access through&nbsp;<code>MPI.COMM_WORLD.Get_rank()</code>.</li>
<li>We can use the rank of the processor to divide up the work. &nbsp;Basically we'll set up some conditionals so that a certain process will only run some block of code, if it is the correct rank.</li>
<li>I denoted rank = 0, to be a sort of "master" process. &nbsp;It is what is dividing up all of the work into&nbsp;<code>n</code> pieces. &nbsp;Where <code>n</code> is the number of cores we are going to use. &nbsp;This is generally the sum of the numbers in your machinefile.</li>
</ul>
<p>This is the block where we divide up the work.</p>
<pre>i = int(sys.argv[1])<br />    size = comm.Get_size()<br />    workers = size - 1<br />    termsplits = []<br />    for x in range(workers):<br />        termsplits.append(xrange(x, i+1, workers))</pre>
<ul>
<li>Here, I denote the total cores - 1 to be the number of workers. &nbsp;(Because 1 of them is going to be the master process)</li>
<li>Then, I split the terms into <code>workers</code> lists, where <code>workers</code> is an integer. &nbsp;I also used workers as the step size so that each list would have a similar number of small and large numbers.</li>
</ul>
<p>Next we need to send the lists out to their respective cores.</p>
<pre>for y, data in enumerate(termsplits):<br />    dest = y + 1<br />    if dest &gt;= size:<br />        pass<br />    else:<br />        comm.send(data, dest=dest)</pre>
<ul>
<li>Here, we use comm.send() to send the term lists to each of the processors. &nbsp;The first argument is the data you want to send, and the 2nd argument is the core you want to send it to.</li>
</ul>
<p>Then we need to receive the data into the other cores.</p>
<pre>data = comm.recv(source=0)<br />    pi_appr = nthTerm(data)<br />    comm.send(pi_appr, dest=0)</pre>
<p>Here we are receiving the data from the master core, computing the sum of the terms in our <code>data</code> list, and sending it back to the master core. &nbsp;All that's left is to sum of the data from each of the computing cores.</p>
<pre>for y in range(workers):<br />    source = y + 1<br />    print("Waiting for source:{}".format(source))<br />    pi_appr = comm.recv(source=source)<br />    pi += pi_appr<br />    print(pi)</pre>
<p>And that's pretty much it! &nbsp;For comparison, let's check our timing on the parallel version.</p>
<p><img alt="" height="189" src="/static/media/uploads/blog/mpi4py_Tutorial/piparallel.png" width="641" /></p>
<p>Considerably faster! &nbsp;About 10x with 11 computing cores.</p>

Latest revision as of 09:46, 19 March 2016