Python 3 (64 bit) + scipy on Windows
I tried original one, but didn’t support scipy
(x64) I just can’t install scipy
as a matter of fact, scipy
doesn’t provide pre-compiled solution for Windows in the first place, and installing python in way, I think, I have no hope for it.
So next, I go for another option which I can properly setup my complier, namely cygwin.
I have read somewhere, it will work if I install python (32 bit) instead. But why should I, this is 2017!?
I tried cygwin … it failed to build scipy
I first installed these packages from the cygwin’s installer.
python3
python3-devel
(you need this if you want to install packgaes likenumpy
)python3-pip
After installation, Python 3 will take the name of python3
to make it work as a default python
I did some ln -s
and some export PATH
to make it present in the default PATH
.
I tried to install numpy
which I did install successfully. (arduously long install due to the compilation time)
Then, install scipy
now it failed with the following error:
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found
I didn’t seem to find any solutions or workaround online … so I paused there !
So, I give conda
a try … It seems to work
You don’t have to install the full-big anaconda to use conda
, you can just install the smaller miniconda it will do the job.
I think conda
has its own way to install packages, I think it still leverage the use of pip
in some point but also with some modifications. So in short, conda
is a front-end to multi-platform mulit-language package installation.
When I install numpy
scipy
it seemed to me that conda
also install mkl
(I don’t know much about this, it’s like a math optimization library for Intel CPUs or something).
But at the end, it did install scipy
and numpy
successfully ! yay …
Python’s multiprocessing
is slow under Windows
After happily using it for a while … I spot that using multiprocessing.Pool
is quite slow, it’s slower than single core processing! I’m quite sure that I’m not new to this. I know how to make it work properly, at least under unix-like environment.
So, I dug deeper into this problem … which turned out to be problem with the Windows itself.
It’s not to blame conda
for it, it’s the problem between Windows and Python
As you know, multiproccesing.Pool
using multiple processes to leverage a speed boost, which threading cannot in Python (because of the existance of Global Interpreter Lock).
This problem doesn’t seem to have much problem in the user-end point of view, the usage of multiprocessing
is itself easy enough, as long as you keep your program functional.
However, it is said that spawnwing a process under Windows environment is much much slower than in Linux. As been asked in here: http://stackoverflow.com/questions/8775475/python-using-multiprocess-is-slower-than-not-using-it. To me, the argument is quite valid. I have long heard that Windows favors multi-threading and Linux favors multi-processing, so it is sure that on which OS which one will be optimized.
Unfortunately, multi-threading for speed boost under computational intensive is not supported in Python, and multi-processing is not very well supported by Windows. I have left with almost zero option, it seems quite a deal breaker.
chunksize
might help
As a matter of fact, Windows’ process is quite slow to start, this greatly reduces the viability of python’s multiprocessing.Pool
. If spawning a new process is slow, then should we spawn less processes and distribute a larger bit of work for each process instead ? This is exactly what chunksize
parameter in pool.imap
does.
from multiprocessing import Pool
with Pool() as pool:
for result in pool.imap(fn, jobs, chunksize=16):
....
So, in short, you should set chunksize
somewhat larger than 1 and see if the problem mitigates.
I will try to use it for the moment … hope I can live with it :D