sebastiandaschner blog


How to crawl websites with Selenide and JDK 14+

#automation #java thursday, july 22, 2021

Sometimes we find ourselves in a situation in which we need certain data, that needs to be manually fetched from some website. As developers, of course automation is our friend, which is why we can write some automated approach to crawl websites, instead of searching all this information ourselves. I’ve recorded a video, in which I’m fetching up some data from my blog website and transform it into a CSV format, by using Selenide and some new Java features such as Records.

 

 

Please keep in mind to be a nice citizen and only use such techniques for websites and situations where you’re allowed to do so, and where your actions don’t disrupt any service.

You can find the code example on GitHub: Selenium Playground

What we’re doing is to use Selenide with it’s helpful queries and methods, and Java Records and Streams to map the entries of my blog to a desired output format. The difference to using a web API is that we have to be a bit more creative in how we identify and get the individual parts, since the data is not necessarily structured for automated consumption.

 

Found the post useful? Subscribe to my newsletter for more free content, tips and tricks on IT & Java: