Download all documents with a given extension in a web page using terminal
I used to have a python script that scrapes a given web page and downloads all the files with a given extension from the page. Today, not being able to find my script anymore, I tried to do a similar thing using bash only and interestingly, I was able to do a similar thing using a one-liner. Here is how it works:
lynx
Lynx is a hero of dumping web pages:
lynx --dump https://hadi.timachi.com/
grep
We pipe the result to grep with an extension for target files:
lynx --dump https://hadi.timachi.com/ | grep rar
wget
And finally, download with wget
lynx --dump --nonumbers https://hadi.timachi.com/ | grep rar | wget -i -
The --nonumbers
option removes
numbers from the beginning of links.
And that is all, simple and nice.