Using Selenium(Python) on AWS Ec2 for web scrapping in 3 easy steps
I have been making scrappers and deploying them for my projects, since a long time. And if you’ve worked with scrappers, you might already have figured out, that selenium is the best choice for this job.
Now the problem with this is, running selenium is a heavy process, and hence, uses a lot of your computing resources. And if you have a budget computer, and you’re scrapping a ton of data, then probably, you wouldn't want to do something else while the scrap is going on. Basically it eats RAM and processing power to a huge extent.
So, here comes the Ec2 service from AWS. In this service they basically provide a computer, with required specs to you. And you pay for the specs you choose, i.e. the computing power you use. SIMPLE. And if you don’t want to pay, you can use the free tier eligible resources.
So, shifting the scrapper to Ec2 seems to be a rational idea.
But if you google something like “running python selenium on ec2”, all it’s results go something like :
1. Installing ChromeDriver
- *some tedious process*
and
2. Installing Google Chrome :
- *some more tedious process*
- *and lots of debugging and head banging *
But that’s not what we are going to do here, I will show you a pretty simple alternative to it
All you have to do is →
- Go to the AWS marketplace Subscriptions (after signing in) and search for “Selenium Webdriver on Headless Ubuntu” or click here.
- Click Continue to Subscribe
3. and from the next page, launch a new ec2 instance. If you can’t find Launch it button there, you can got to the AWS Marketplace and click manage subscriptions, then choose “Selenium Webdriver on Headless Ubuntu” and click on Launch new instance.
then follow normal steps to make a new ec2 instance. And voila !!
You now have an Ec2 instance, up and running, with selenium built-in.
Now, once logged on the ec2 console, in the terminal, type
ls
and you will see 2 directories namely python2 and python3
go to the python 3 directory using
cd python3
ls
and type ls to see the list of example selenium codes there.
now, you can run it with
python3 sample-chrome.py
or for firefox
python3 sample-firefox.py
and it will go to “https://www.selenium.cloud/doc/samples.html” and take a screenshot of it.
after the process is completed, you can view it by typing
ls
in the same directory.
Now, you can refer to the code in there using
nano sample-chrome.py
and make a new one or modify it according your requirments.