Scraper-helper
capture http and https connections

Scraper-helper is a HTTP proxy that writes everything passing through it to a log file and saves the decoded bodies of HTTP requests and responses to individual files. It works with HTTPS, which means it performs a man in the middle attack SSL do it can decode all encrypted connections as well. And it deliberately degrades the protocols and cyphers and used by SSL & TLS, so Wireshark can decrypt them. In other words, it strives to make everything a web browser sends and receives visible.

There are many reasons you might want to do this. I use it for scraping web sites. The technique is to intercept the browser, then your write your scraping program, then compare the trace from your scraping program to the original until the look the same.

Scraper-helper can also replay an earlier recordings it made. In replay mode the http requests sent to the server are read from a trace file written by an earlier run. Scraper-helper times the interval between the requests to so the playback looks identical to the server, but it can adjust cookies in playback mode so things like session cookies work. The recording can then be edited to discover what bits are are necessary, and what aren't.

Documentation

There is a man page, a change log and a README.

Copyright and License

Scraper-helper is copyright © 2014,2015,2019,2021 Russell Stuart. It is licensed under the GNU Affero General Public License.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The copyright holders grant you an additional permission under Section 7 of the GNU Affero General Public License, version 3, exempting you from the requirement in Section 6 of the GNU General Public License, version 3, to accompany Corresponding Source with Installation Information for the Program or any work based on the Program. You are still required to comply with all other Section 6 requirements to provide Corresponding Source.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

Downloading, Feedback & Contributing

Development for scraper-helper is hosted on Source Forge:

 


Russell Stuart, 2019-Sep-19.