Posts → Download all documents with a given extension in a web page using terminal

Download all documents with a given extension in a web page using terminal

I used to have a python script that scrapes a given web page and downloads all the files with a given extension from the page. Today, not being able to find my script anymore, I tried to do a similar thing using bash only and interestingly, I was able to do a similar thing using a one-liner. Here is how it works:

lynx

Lynx is a hero of dumping web pages:

lynx --dump https://hadi.timachi.com/

grep

We pipe the result to grep with an extension for target files:

lynx --dump https://hadi.timachi.com/ | grep rar

wget

And finally, download with wget

lynx --dump --nonumbers https://hadi.timachi.com/ | grep rar | wget -i -

The --nonumbers option removes numbers from the beginning of links.

And that is all, simple and nice.