How to Scrape a website using PHP?

5 years ago
Read Time: 2 minutes
by ganofins
Leave a comment

Hey Guys,

Today I will tell you how you can scrape a website using PHP language. To scrape a website using PHP you need to include simple_html_dom.php file in your PHP file.

This file contains predefined functions to parse the html website or to search through the tags of that site. Keep in mind Scraping a website without the site’s permission can be considered as illegal.

*This post is just for Educational Purpose.

First choose the website and the data on it which you want to Scrape. Here I am taking the example of AndroidHeadlines.com site. From it we are going to scrape the Latest Headlines.

Step 1 : First you need to start the PHP tags–

	<?php
	?>

view raw PHPScrapeSample.php hosted with

by GitHub

Step 2 : Second include the simple_html_dom file in your PHP code and place that file into the same folder –

	<?php
	require_once("simple_html_dom.php");
	?>

view raw PHPScrapeSample2.php hosted with

by GitHub

Step 3 : Now create a variable which will contain a method named as file_get_html (this method will create the Document Object Model for the URL provided by the user inside it’s parenthesis) –

	<?php
	require_once("simple_html_dom.php");
	$html = file_get_html("https://www.androidheadlines.com/");
	?>

view raw PHPScrapeSample3.php hosted with

by GitHub

Step 4 : Now by using the variable $html, we can find the site’s tag. So let’s find the tag which contains all the latest posts. For finding the tag inside the $html variable we will use find() function –

	<?php
	require_once("simple_html_dom.php");
	$html = file_get_html("https://www.androidheadlines.com/");
	$headlines = $html -> find("div[class=container]")[0];
	?>

view raw PHPScrapeSample4.php hosted with

by GitHub

Step 5 : As we only want to scrape the title of the headline and there being multiple headlines, we need to create an array to store all these headlines –

	<?php

	require_once("simple_html_dom.php");
	$html = file_get_html("https://www.androidheadlines.com/");
	$headlines = $html -> find("div[class=container]")[0];
	$titles = array();

	?>

view raw PHPScrapeSample5.php hosted with

by GitHub

Step 6 : Now we are going to find the tag which contains the title of the headline. As you can see the span tag contains the title. So, just scrape it and don’t write any index at the end. Now we can directly save it to our titles array –

	<?php

	require_once("simple_html_dom.php");
	$html = file_get_html("https://www.androidheadlines.com/");
	$headlines = $html -> find("div[class=container]")[0];
	$titles = array();
	$titles = $headlines -> find("span[class=featured-title]");

	?>

view raw PHPScrapeSample6.php hosted with

by GitHub

Step 7 : Now to print the array titles use foreach or any other loop –

	<?php

	require_once("simple_html_dom.php");
	$html = file_get_html("https://www.androidheadlines.com/");
	$headlines = $html -> find("div[class=container]")[0];
	$titles = array();
	$titles = $headlines -> find("span[class=featured-title]");
	foreach($titles as $title){
	echo $title."<br>";
	}

	?>

view raw PHPScrapeSample7.php hosted with

by GitHub

Step 8 : Finally, You’ll obtain the scraped data as output in the following manner –

I hope now you know how to actually scrape data from a website.

Feel free to contact me for any question.

how to scrape website php scrape scrape a website scrape a website using php scrape site using php scrape website using php simple html dom parser php

How to Scrape a website using PHP?

Step 1 : First you need to start the PHP tags–

Step 2 : Second include the simple_html_dom file in your PHP code and place that file into the same folder –

Step 3 : Now create a variable which will contain a method named as file_get_html (this method will create the Document Object Model for the URL provided by the user inside it’s parenthesis) –

Step 4 : Now by using the variable $html, we can find the site’s tag. So let’s find the tag which contains all the latest posts. For finding the tag inside the $html variable we will use find() function –

Step 5 : As we only want to scrape the title of the headline and there being multiple headlines, we need to create an array to store all these headlines –

Step 6 : Now we are going to find the tag which contains the title of the headline. As you can see the span tag contains the title. So, just scrape it and don’t write any index at the end. Now we can directly save it to our titles array –

Step 7 : Now to print the array titles use foreach or any other loop –

Step 8 : Finally, You’ll obtain the scraped data as output in the following manner –

I hope now you know how to actually scrape data from a website.

ganofins

I wrote a Python module Proxy Extractor

How to change the YouTube app view subscriber count settings back to 100k or 10m instead of lakh or crore?

How to create a YouTube Live Subscriber Count Website using YouTube API

How to enable or disable the touchscreen in Manjaro or Arch Linux?

Leave a Reply Cancel reply

Menu

How to Scrape a website using PHP?

Step 1 : First you need to start the PHP tags–

Step 2 : Second include the simple_html_dom file in your PHP code and place that file into the same folder –

Step 3 : Now create a variable which will contain a method named as file_get_html (this method will create the Document Object Model for the URL provided by the user inside it’s parenthesis) –

Step 4 : Now by using the variable $html, we can find the site’s tag. So let’s find the tag which contains all the latest posts. For finding the tag inside the $html variable we will use find() function –

Step 5 : As we only want to scrape the title of the headline and there being multiple headlines, we need to create an array to store all these headlines –

Step 6 : Now we are going to find the tag which contains the title of the headline. As you can see the span tag contains the title. So, just scrape it and don’t write any index at the end. Now we can directly save it to our titles array –

Step 7 : Now to print the array titles use foreach or any other loop –

Step 8 : Finally, You’ll obtain the scraped data as output in the following manner –

I hope now you know how to actually scrape data from a website.

ganofins

Related Posts

I wrote a Python module Proxy Extractor

How to change the YouTube app view subscriber count settings back to 100k or 10m instead of lakh or crore?

How to create a YouTube Live Subscriber Count Website using YouTube API

How to enable or disable the touchscreen in Manjaro or Arch Linux?

Leave a Reply Cancel reply

Menu