1K Blog Marathon: Day 39
By using UiPath, you can automate your tasks and minimize the time for a job to be completed. You can also scrape data from websites and mimic user input.
We can copy and process the data coming from a simple Google Search. The result from a search is called SERP. Let’s manipulate this data using our bot!
Getting Started with UiPath: Part 1
Getting Started with UiPath: Part 2
What is UiPath: RPA in Layman’s Term
Scraping data from Google’s SERP
If you’ve been following this UiPath tutorials, we are now on copying the result page of a Chrome search.
1. On the Top Ribbon Tab, click “Data Scraping”.
2. A popup window will show. You can read the instructions to have more insights.
3. Click “Next”. Now you can see that wherever you are hovering your mouse, a semi-transparent blue selector is visible and adjusting its shape according to the detected UI Element inside the browser.
4. Now on the Organic result of the SERP, select the first result (not the images or ad results, just the organic). In our case, that result is coming from Wikipedia.
5. After clicking on the first result, scroll down to the last.
6. Click “Next” to select the second element. Preferably, you can select the last result as the second element to select the entire collection (entire list).
7. Now you can configure the columns. You will be offered 2 Columns: Text and URL. Check them both.
8. Click Next. Now you can see the sample Data scraped from the page, showing column 1 as the Title and the column 2 as the links.
9. Click “Finish”.
10. A message box will appear, asking if the data we are scraping is only on this page, or spanning to multiple pages. For this tutorial, let’s just copy the first page, so click “No”.
11. The Data Scraping is now recorded. Make sure that the sequence of the Data Scraping is connected to the Open Browser activity – check for arrow connection.
12. This is the final result activity. Click on the “Extract Structured Data ‘DIV’ rso”, then look on its properties on the right tab.
13. Take note on the “Output” >> DataTable property. The default value is “ExtractDataTable”. This is a generated variable with DataTable as its data type. You can also see this variable when you click below on “Variables”.
14. Change its scope from “Data Scraping” to “Flowchart”. This way, we are setting this variable to be usable on any place inside the Flowchart.
Working with Excel File
Now that we are copying the data, it’s time to show it using an Excel File.
1. First, add an “Excel Application Scope” activity. Just search for “excel” in the activities search box.
2. Drag it inside the Flowchart, and again make sure that it is connected to the previous activities by arrow line. Double-click inside it.
3. Once inside the Excel Application Scope Activity, just type a path/filename on the “Workbook path” text box. Let’s try “Extract_SERP.xlsx”.
*Note: you can also use CSV or any Excel file extension that is readable when opened using MS Excel.
4. Add another Activity. Search for “Write Range”, and drag-drop or double-click it to add it inside the scope.
5. On the “Data table” textbox, add the DataTable variable from the Data Scraping above. It is called “ExtractDataTable”. Since its a variable, you don’t need to add double quotation marks.
6. Now you can debug the bot! After the successful run, you can see a newly created Excel file on the directory of your project. Open it and you can see the extracted data from the Google’s SERP.
You can now manipulate this data as you wish. You can also apply this approach to any search or data copying from one website.
I hope you learn something new from this tutorial. You now learn the basic Data Scraping using UiPath. Thank you and come back tomorrow for another blog post!
“And that’s one blog, stay hungry!”
There’s a lot of automation that can happen that isn’t a replacement of humans but of mind-numbing behavior.Stewart Butterfield