Home
Home Page
Krossbrauzernoe alignment on the center (table height=100 %)
Lists a la MS Word
Features of links « upwards pages »
Tables and borders
We get rid from cellpadding and cellspacing
Optimum use MySQL
Competent job with files: exclusive blocking of files
How to learn{find out} the size of a file on any web - server
Regular expressions
Processing of lines in RNR
Creation of system of the account of visitings
Tracking a content on dynamic sites
CSS: advice{councils} and receptions, a part 2
CSS: advice{councils} and receptions, a part 1
Ten rules of a spelling of a safe code on PHP
About tags and metatags
Potential vulnerability of php-scripts
Removal{Distance} of a line from a file
The generator of passwords
Links
 

Tracking a content on dynamic sites


Background


This clause{article} has been conceived and written due to forum XPoint <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fxpoint.ru%2F>.


The methods described below, are not the personal invention of the author. Simply he, being absolutely still a teapot, personally faced a described problem and could not find an explanatory{a sensible} material for its{her} decision. The decision has been found by perusal of ten clauses{articles} some close subjects, muchen`ja people from forum XPoint and week night fight with PHP.


Creating this clause{article}, the author has aimed to help to solve to less most experienced people a described below problem without damage to their mode of day.


Attention: not the fact, that you will manage all to make from the first for half an hour. The fact, that before something to do{make}, never lishne to esteem how it was done{made} by others.

About what this clause{article}


If you sometime were engaged in creation of a dynamic site, that is a site consisting not from static HTML of pages, and the scripts cooperating with different files and databases, you for certain collided{faced} (and if no still will collide{face}) with such problems:

?         "Correct" caching;

?         "Correct" HTTP headings


I explain the first item{point}. Caching - the mechanism allowing the client (that is to the user on that end of connection, is more exact - to his  browser, this process is imperceptible for the user) at viewing the same files (for example, the pictures making elements of design of a site, or files of styles CSS) to not download their each time anew, and to download only once, and then, as required, to use saved on a computer of the client a copy. The principle of his  job is approximately those: at occurrence of the next necessity to download a file the client addresses to the server with search, whether the file has been changed (whether is obsolete the copy by the machine of the client) and if no, does not download it  anew, and uses the saved variant. In the browsers precisely adhering to the specification of report HTTP, it is possible to achieve also that searches will not be sent at all each time if there is a saved copy of a file and it is precisely known, that she has not had time to become outdated. Charm in that the traffic (volume of the data downloaded from the Internet) the client decreases, and the user sees, that « the site works quickly ». Naturally, the traffic of the server that allows to lower loading on the server decreases also and even to save, if you use a paid hosting.


The problem will be, that the given mechanism for pages of a dynamic site by itself will not work, it{he} should be constructed (whereas for static pages and pictures servers can usually fully automate process). Construction of system of realization of " correct caching » is described further, in sections the Theory and Practice.


I shall explain now item{point} of the second. When the client requests of the server a file under report HTTP, he, except for contents of the file (a code in case HTML of a file, the text in case of a text file, etc.), receives also HTTP headers - headings HTTP. These are service text fields with the information which is not displayed in a browser, but it is interpreted by him  and, basically, serves for the message to the client of the data on required page.


Heading ETag (« an objective label »), for example, serves for assignment to each page of the unique identifier which remains constant while the page is not modified, and changes, if the data on page have changed. This heading is saved on the client, and in case of need repeated uploading "mechenoj" pages allows a browser to address to the server with search ' If-None-Match ' - in that case the server should to define{determine} on value of a ETag-label of the copy saved on the client, whether she and if no is obsolete, to answer with a code ' 304 Not Modified ' ("is not modified"), and the page will not be skachena once again.


Heading Last-Modified (« last change ») is intended to inform the client date and time when last time the required page has changed. Using it , the client, similarly to a case with ETag, can address to the server with search ' If-Modified-Since ' - in this case the server should compare date of last updating of the copy saved on the client, to actual date of last updating. If they will coincide, it means, that the copy in a cache of the client is not obsolete, and repeated uploading is not necessary (a code of the answer ' 304 Not Modified '). Last-Modified also it is necessary for correct processing your site by robots - spiders (the spider, English « The spider » is a robot which goes on a web of the Internet and indexs sites that they could be found through search engines, for example, Google) which use the information on date of updating of pages with a view of sorting results of search by date of, and also for definition of frequency obnovljaemosti your site (see for example, that about it writes JAndex).


What from these methods of definition of "freshness" of Internet - pages is used by the client (and whether he uses them in general), depends on his  opportunities and adjustments. On good, it is necessary to send these both headings with each file given your server.


There is still heading Expires ("expiration") - he informs a browser, what time interval can be counted, that the copy of page in a cache is fresh, and to not address at all to the server with searches. It is convenient for such files about which you precisely know, that they will not change the nearest hour / sS”eh/¼SJAnµ: a background picture of page, for example. Unfortunately, it is supported not by all browsers. Considered{examined} example Expires it will be equal to ten minutes that approaches for the majority of sites on which the information obnovjaetsja not too often (~ time at one o'clock).


To realize sending "correct" HTTP headings pages of your site, it is necessary to define{determine} somehow when these pages are modified. The problem will be, that as against a situation with static HTML files when date of updating of a file and his  contents this one and too, dynamic pages can change the contents depending on the external factors stipulated by the developer (time of day, search of the user, import of the data a DB) without change of a file of a script. That is date of updating of a file and the information which he sends the client, without ceremony can not coincide. Other problem following from previous{last}, consists that often on dynamic sites on each page put voting, " a joke of day " or bannerokrutilku which code varies at each following loading. That is it is necessary to look not at all generated page, but only on that its{her} part which bears{carries} the basic information - the text of clause{article}, the price-list, etc.


At first sight all is very difficult and not clear, but to be frightened it does not cost, therefore as system which will be considered further, simple enough. It is more complex  to understand that we need to make, than how.



The theory


For the decision of the above described problems it is necessary:

1.         "To catch" from page that part of a content which we "watch{keep up}";

2.         To compare that has turned out, with that that has turned out last time - for this purpose to use a DB where and to store{keep} the information on pages;

3.         In case of updating the data - to update the information on page in a DB;

4.         To send client HTTP headings depending on his  search.


To make it it is possible in the different ways. We shall consider{examine} the system written in language PHP as this language is now popular, rather simple for understanding and is supported by the majority of hostings, including free-of-charge. We shall store{keep} the data in DB MySQL for the same reasons.


So, what there at us there, on a site? Dynamic pages, that is PHP scripts. How this "dynamics{changes}" functions? If a site at you small, most likely, you under each page have script: index.php - for the main page, news.php for page of news, etc. If the site at you is beyond "homepage" and has complex  structure, the forum or the user zone, is constructed with use of databases, that, probably, one script such as showpage.php generates hundreds conceptually various pages (for example, pages of a forum one script generates, but pages different, and it is necessary to watch{keep up} everyone separately). The first case is easier for consideration though if you will understand essence of offered{suggested} system, you can integrate without special problems her  and on a site described in the second case. And we shall consider a case the first.


That should "be avoided" at "catching" a content? Banners, counters and everything, that is written to the menu. Not essentially - you solve it, what volume of the information given your scripts, will be used for definition of their updating.


If the "necessary" information and "unnecessary" plugins are deduced by different functions "necessary" it is possible to hammer in all simply in one variable. Something like it:



<? php


// It - only a piece hypothetical PHP a file.

// He is not meaningful, and is used only as an example.


include ' settings.inc.php '; // It is connected any adjustments, modules, classes, etc.

print page_header (); // we Deduce{Remove} a cap - she is not necessary for us;

print banners (); // we Show banners - similarly;


print $contentmonitoring_var = main_info (); // It is shown the "basic" information;

// It just that is necessary for us - we put the data in a variable $contentmonitoring_var


print $temp = info2 (); // Still any necessary information;

$contentmonitoring_var. = $temp; // it is added the data in the same variable


print something_other (); // Not that;

print page_footer (); // Again not that...


?>


If your site too bulky and complex  for such updating, is more simple output{exit}: buffering of a conclusion, standard function PHP (When buffering of a conclusion is active, everything, that generates a script, it is not sent to the client, and is saved in the internal buffer.) that has got in the buffer, it is possible to put in a variable and to work with these data as with a usual line variable, and then to send the client. Then it is necessary for you to find simply in a code the beginning and the end of that piece of the deduced{removed} information which should be checked, and to mark this piece HTML-comments. For this purpose before the beginning of a conclusion of the information interesting us it is necessary to insert a line



<! - content->


And the ambassador - a line



<! -/content->


Take into account only, that buffering can brake your scripts if they deduce great volumes of the information (on some megabyte), and also does not allow to deduce{remove} the information in the portions in process of performance of a script - the data will be sent to the client only after all script (for small scripts it not terribly) will be executed.


Now we have separated « flies from cutlets » and the information which we shall be "content - monitorit`", is made at us between HTML-comments. It is necessary to switch on buffering, at the end of a script to take the data from the buffer, to catch therefrom a part of a code between comments, and to process it  for updating. By results of this processing, and also depending on search of the client to send HTTP headings. All this will do{make} a script which should be connected to all your scripts. It is done{made} so: in the beginning of each file, right after lines <? php we insert a code



ob_start (); // Tracking a content - we start buffering.


And in the end, before a line?> a code



include_once (' content_monitoring.inc.php '); // Tracking a content - the executed script is connected.


The first line starts buffering that it was possible to work with that that generates a script, and the second connects a script which works with that, that nagenerirovala page, whence it  have connected. He checks, whether she was modified, and sends HTTP headings.


And the theory, in general. I shall add still, that it is possible to expand the given system - for example, here it is possible to add the counter of visitings. But it already is beyond subjects of given clause{article}.



Practice


Code of an executed script:



<? php


// A file content_monitoring.inc.php, system of content - monitoring.

// Should lay in the same folder, as your scripts.


if (! isset ($page_id)) $page_id = $ _SERVER [' PHP_SELF '];

// $page_id it is the identifier of page in a database.

// If you use the circuit 1 script = 1 page, leave all as is.

// If at you one script generates many conceptually different pages,

// It is necessary to generate $page_id these scripts dynamically.

// The variable should not contain anything except for latinicy, figures and underlining{emphasis}.


ConnectToDatabase (); // we Are connected to a database.

// Function ConnectToDatabase () is not determined in this file, as

// Connection to DB MySQL and processing of the arisen mistakes leaves

// For frameworks of this clause{article}. You should write this function itself.


$page_all_contents = ob_get_contents (); // we Get a conclusion of a script from the buffer

$page_main_content = preg_replace (" *. *? (<! - content-> (. *?) <! -/content-> | $) *is ", "$2", $page_all_contents);

$page_hash = md5 ($page_main_content); // It is calculated khehsh the "necessary" part of page


// We shall check up, whether the table of statistics is created...

if (! mysql_num_rows (mysql_query (" SHOW TABLES LIKE ' content_monitoring ' "))) {

    // The table of statistics is not found.

    mysql_query (" CREATE TABLE content_monitoring (cm_id VARCHAR (255) NOT NULL, cm_md5 CHAR (32), cm_modified TIMESTAMP (14), PRIMARY KEY (cm_id (255))) "); // we Create the table.

    mysql_query (" INSERT INTO content_monitoring VALUES (' $page_id ',' $page_hash ', NULL) "); // we Write to the table the first recording.

}


// We get from a database the information on the current page.

$FILE_INFO = mysql_fetch_row (mysql_query (" SELECT cm_id, cm_md5, UNIX_TIMESTAMP (cm_modified) as modified FROM content_monitoring WHERE cm_id = ' $page_id ' "));


if (empty ($FILE_INFO)) {

    // Recording for a file is not found.

    mysql_query (" INSERT INTO content_monitoring VALUES (' $page_id ',' $page_hash ', NULL); "); // we Create recording about the current file.

    $FILE_INFO [2] = time (); // Updating - now.

    $last_modified = gmdate (" D, d M Y H:i:s ", $FILE_INFO [2]);

}

else {

    // Recording for a file is found.

    if ($page_hash! = $FILE_INFO [1]) {

    mysql_query (" UPDATE content_monitoring SET cm_md5 = ' $ page_hash ' WHERE cm_id = ' $ page_id '; "); // we Update recording about the current file if he has changed.

    $FILE_INFO [2] = time (); // Updating - now.

}


    $last_modified = gmdate (" D, d M Y H:i:s ", $FILE_INFO [2]);


    // We do{make} processing Conditional GET'b further:

    if (! isset ($ _SERVER [' HTTP_IF_NONE_MATCH ']) **! isset ($ _SERVER [' HTTP_IF_MODIFIED_SINCE '])) {

    // Conditional Get it is not set - simply we give a file.

    header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

    header (" Last-Modified: $last_modified GMT "); // last change - now

    header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}


    elseif (! isset ($ _SERVER [' HTTP_IF_NONE_MATCH ']) ** isset ($ _SERVER [' HTTP_IF_MODIFIED_SINCE '])) {

    // The case the first - Conditional GET is set, check only on If-Modified-Since:

    $unix_ims = strtotime ($ _SERVER [' HTTP_IF_MODIFIED_SINCE ']); // value If-Modified-Since in UNIX a format

    if ($unix_ims> time () ||! is_int ($unix_ims)) {

        // Mistake Conditional GET - simply we give a file.

        header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

        header (" Last-Modified: $last_modified GMT "); // last change - now

        header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}

    else {

        // Conditional GET it is correct.

        if ($unix_ims> = $FILE_INFO [2]) {

            // The copy of a file in keshe the client is not obsolete - we inform him on it...

            header (" HTTP/1.1 304 Not Modified "); // it is not modified

            header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

            //.. Also we finish performance of a script, not sending a file.

            while (ob_get_level ()) ob_end_clean ();

            exit;

}

        else {

            // Similar, that the copy of the client is obsolete.

            header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

            header (" Last-Modified: $last_modified GMT "); // last change - now

            header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}

}

}


    elseif (isset ($ _SERVER [' HTTP_IF_NONE_MATCH ']) **! isset ($ _SERVER [' HTTP_IF_MODIFIED_SINCE '])) {

    // The case of the second - Conditional GET is set, check only on If-None-Match:

    $INM = split (' [,] []? ', $ _SERVER [' HTTP_IF_NONE_MATCH ']); // a file of values If-None-Match

    foreach ($INM as $enity) {

        if ($enity == "\" $page_hash \ "") {

            // The copy of a file in keshe the client is not obsolete - we inform him on it...

            header (" HTTP/1.1 304 Not Modified "); // it is not modified

            header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

            //.. Also we finish performance of a script, not sending a file.

            while (ob_get_level ()) ob_end_clean ();

            exit;

}

        // If has reached this line, the copy of the client is obsolete. We give a file.

        header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

        header (" Last-Modified: $last_modified GMT "); // last change - now

        header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}

}


    else {

    // A case the third - check both on If-Modified-Since, and on If-None-Match:

    $unix_ims = strtotime ($ _SERVER [' HTTP_IF_MODIFIED_SINCE ']); // value If-Modified-Since in UNIX a format

    $INM = split (' [,] []? ', $ _SERVER [' HTTP_IF_NONE_MATCH ']); // a file of values If-None-Match

    if ($unix_ims> time () ||! is_int ($unix_ims)) {

        // Mistake Conditional Get - simply we give a file.

        header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

        header (" Last-Modified: $last_modified GMT "); // last change - now

        header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}

    else {

        // Conditional GET it is correct.

        foreach ($INM as $enity) {

            if ($enity == "\" $page_hash \ "" ** $unix_ims> = $FILE_INFO [2]) {

            // The copy of a file in keshe the client is not obsolete - we inform him on it...

            header (" HTTP/1.1 304 Not Modified "); // it is not modified

            header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

            //.. Also we finish performance of a script, not sending a file.

            while (ob_get_level ()) ob_end_clean ();

            exit;

}

        // If has reached this line, the copy of the client is obsolete. We give a file.

        header (' ETag: " '. $ page_hash. ' " '); // prisvaevaem a label

        header (" Last-Modified: $last_modified GMT "); // last change - now

        header (' Expires: '.gmdate (" D, d M Y H:i:s ", time () +60*10). 'GMT'); // the page remains constant 10 minutes

}

}

}

}


?>



Some words at last


Given clause{article}, opinion of its{her} author in relation to the lifted problem, and also the offered{suggested} methods of the decision of this problem do not apply for universality - quite probably (and, most likely, indeed), that there is more simple and best decision for your case. The author did not put before itself the purpose to create something ingenious, that will be essentially capable to change the Internet is it is impossible. He only wanted to show one of possible{probable} ways of small simplification of a life to the webs - developers mostly beginning{starting}.


If you, having read this clause{article} closely{attentively} and up to the end, have understood nothing, it means or that you a full teapot in the given area, or that the lifted question of you at all does not excite (though it also can mean, that clause{article} could be written where more qualitatively, and it so). In the first case (that is in a case when written it is interesting to you, but is not clear) I can recommend more to read different clauses{articles}, and, necessarily, manuals. In the second I can envy only - in your life on one problem less.


In any case I shall be glad, if to somebody has helped.