Skip to main content

Installing Scrapy With Pyenv (Linux)

If you don't already know what Scrapy is, this post won't tell you. This post will tell you how to install Scrapy. Here's what this post contains:

1. Installing and working on Python's latest version on Linux
2. Installing Pyenv
3. Installing Scrapy

Installing and working on Python's latest version on Linux

Scrapy would require Python 3.6+. Default Python version on a linux distro is most likely Python 2.7. To work with Scrapy, you'd need to install a compatible version. 

You can easily install python using the standard sudo apt install and commands of the likes, or you can download and build it, but it's not that simple:

Running projects on a latest Python version on Linux is tough. Why? Because:

1. System-wide Python 2.7

The system-wide Python 2.7 is used by several of system's applications/programs. To keep them working as they do, the system Python needs to continue being the same, forever.

So, you cannot replace it with any of the latest Python 3 versions.

There should be a way to keep both Python 2.7 and Python 3.6+; be able to use Python 3.6+ for our projects and let the system continue using Python 2.7. 

What makes it complicated to be able to use more than one Python versions? The Path.

2. The Path

When you type python --version, your terminal shell will look for the command Python in specific paths and will return system's Python version which is probably 2.7. Even if you enter command python3 --version, it probably will not check the version of the python3 you want.

To see which paths your shell checks in, enter this:

echo $PATH

So even if you have installed the latest Python version, you cannot get your system to run it.

What's the solution?

1. Alias

2. Virtual environments like Pyenv (I used this one)

1. Alias

Adding an alias is a way to define path for certain commands. So if you enter the following, you're telling your shell to check in the path you've mentioned whenever you enter the self-defined command, "pythonnew":

alias pythonnew=/xyz/path/python3.x.x

Then add this to your bashrc, or whichever shell initilization file as per your shell.

export PATH=/xyz/path:$PATH

Install your new Python in the folder you will add to alias and then check the version to see if this will work. I take no guarantee. :)

Also, you'll have to add an alias each time you open the terminal. Aliases aren't permanent, but can be made permanent, go look it up.

2. Virtual environment(Pyenv)

Managing many Python versions or switching between them is complicated. Pyenv lets you create virtual environments that have their own Python version independent of any other Python version installed on your system.

With Pyenv, you can install any number of Python versions. You can create virtual environments for various projects. Each project can be located in its own virtual environment using its a python version independent of the system python.

This gives you the liberty of choosing the Python version you prefer for a particular project. For now, we're only concerned with Scrapy project. Heading forward to installing Pyenv next:

Installing Pyenv

Here a simple guide on how to install Pyenv. I had to deviate from this installation guide a little because of my shell being Zsh and discussed it here. If you use bash, the instructions in the guide should suffice.

Install Python version of your choice

Scrapy requires Python 3.6+. So, once you're done installing Pyenv, you can install any version of your choice. I installed Python 3.9.0.

pyenv install 3.9.0

That should be it. You probably won't need to do anything else, except make this version of python global. 

Why make it global? Assuming, you only want to run a Scrapy project, you probably need only one Python version - which you just installed. And you don't want to keep telling your shell again and again which Python version you want work in. So make it global with:

pyenv global 3.9.0

Now check the python version, you should get something this:

➜  ~ python --version
Python 3.9.0

Create a Virtual Environment for Scrapy

Do this with this command:

pyenv virtualenv 3.9.0 Scrapy

I've named my virtual environment as Scrapy.

This command will create a folder named Scrapy in this folder: .pyenv/versions

The versions folder will contain all python versions you installed installed using Pyenv and all virtual environments.

Next activate your newly created Scrapy virtual environment with this command:

➜  ~ pyenv local Scrapy
(Scrapy) ➜  ~ 

Now your terminal should enter the Scrapy virtual environment, the way it shows above. This virtual environment will activate automatically from now on.

Install Scrapy

Enter these commands to install Scrapy:

pip install Scrapy

That's it. We'll make the crawler in the next post. 

Comments

Popular posts from this blog

Get search results for a specific location - using JS Bookmarklet

Drag these links to the bookmark bar: Ireland US UK  Australia Here are the next steps: 1. Enter the search query in URL field. Do not go to Google.com to search. It is important that the search url you have is of the format in point 2. Else this code won't work. This code was designed on the basis of how search urls are created in Chrome when you enter the search query in the URL field 2. So the search URL should be be of this format: https://www.google.com/search?q=vegetable+seeds&oq=vegeta&aqs=xyz&sourceid=chrome&ie=UTF-8 3. Click on the bookmark as per the location you want the search to happen in.

Installing Pyenv with ZSH

  This post is written assuming you are using OhMyZsh. Install Pyenv as per this:  https://realpython.com/intro-to-pyenv When I followed the instructions as they were in the guide above, I got the error: "command not found: pyenv". Zsh is not able to find pyenv even though it has been correctly installed. So we can tweak the steps a bit to adjust them to our Zsh shell. While installing, instead of the command   curl https://pyenv.run | bash , use the command   curl https://pyenv.run | zsh . This will install Pyenv. The guide asks you to add some lines of code in bashrc. We'll modify those lines and add them into our zshrc. Enter this command in terminal to be edit the zshrc:   sudo nano ~/.zshrc . Nano is a text editor in terminal that allows you to read and edit text files within the terminal. The command  sudo nano  will open the contents of zshrc inside the terminal just as any other text editor with GUI would. You can edit it the fil...

Ways to Search From Any Damn Location

 VPNs suck. Right now we only want to make the website we are visiting "feel" that we are not where we are but where we tell it where we are. To lie to your ISP, you'd still need a VPN, a paid one. There's one alternative to that too, but we'll talk of it later. Several ways: 1. UULE Parameter UULE is a handy search parameter made by Google Ads to help themselves easily see how their search results vary from location to location. It's pretty useful to us. It looks like this: &uule=w+CAIQICIdTG9uZG9uLEVuZ2xhbmQsVW5pdGVkIEtpbmdkb When you append this to a search URL like this https://www.google.com/search?q=[search-term]&oq=[search-term], you'll get search results for London, UK. Try this:  https://www.google.com/search?q=aaa&oq=aaa&uule=w+CAIQICIdTG9uZG9uLEVuZ2xhbmQsVW5pdGVkIEtpbmdkb20&hl=en&gl=uk The UULE parameter consists of w+CAIQICI + [string length] + [base-64 encoded location] I've made bookmarklets for common search locat...