10 trics to reverse engineer web apps. Web scraping 101

TL;DR 1. Use ‘Elements’ tab of Chrome Dev tools to find out which element in the webpage has your data. 2. Avoid getting banned while developing by writing responses of target site in a file and using it for development. 3. Use ‘Network’ tab of Chrome Dev tools to find out structures of API calls and mimic an actual browser. 4. Parse HTML in dynamic sites only if there is no other way to do it. Try to figure out the API first. 5. Use regex to extract small pieces of information, don’t parse HTML for it. 6. Don’t overwhelm the API. 7. If you have to send a ton of requests, use proxies and shuffle user agents. (A nifty trick to use proxies mentioned on below) 8. If data is behind login pages, put wrong password once, capture the structure of the login request and use request.Session objects to login. 9. Use threadpool executor with proxies to send requests in parallel 10. Try interfacing with the mobile version of the same site if desktop version is too complicated. 11. Avoid using Selenium. Try to find out how the frontend of the APP is talking to the backend. Perform and action on the UI and capture the series of request that facilities the action using the Network tab. ...

November 9, 2020 · 22 min · 4664 words · Malay Hazarika