This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Sharing Memory
|
353
value, and so does the memory page it resides on, because the memory is shared in
memory-page units.
Sometimes you have variables that use a lot of memory, and you consider their usage
read-only and expect them to be shared between processes. However, certain opera-
tions that seemingly don’t modify the variable values do modify things internally,
causing the memory to become unshared.
Imagine that you have a 10 MB in-memory database that resides in a single variable,
and you perform various operations on it and want to make sure that the variable is
still shared. For example, if you do some regular expression (regex)–matching pro-
cessing on this variable and you want to use the
pos( ) function, will it make the vari-
able unshared or not? If you access the variable once as a numerical value and once
as a string value, will the variable become unshared?
The
Apache::Peek module comes to the rescue.
Variable unsharing caused by regular expressions
Let’s write a module called Book::MyShared, shown in Example 10-1, which we will
preload at server startup so that all the variables of this module are initially shared by
all children.
This module declares the package
Book::MyShared, loads the Apache::Peek module
and defines the lexically scoped
$readonly variable. In most instances, the $readonly
variable will be very large (perhaps a huge hash data structure), but here we will use
a small variable to simplify this example.
The module also defines three subroutines:
match( ), which does simple character
matching;
print_pos( ), which prints the current position of the matching engine
inside the string that was last matched; and finally
dump( ), which calls the Apache::
Peek
module’s Dump( ) function to dump a raw Perl representation of the $readonly
variable.
Now we write a script (Example 10-2) that prints the process ID (PID) and calls all
three functions. The goal is to check whether
pos( ) makes the variable dirty and
therefore unshared.
Example 10-1. Book/MyShared.pm
package Book::MyShared;
use Apache::Peek;
my $readonly = "Chris";
sub match { $readonly =~ /\w/g; }
sub print_pos { print "pos: ",pos($readonly),"\n";}
sub dump { Dump($readonly); }
1;
,ch10.23775 Page 353 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
354
|
Chapter 10: Improving Performance with Shared Memory and Proper Forking
Before you restart the server, in httpd.conf, set:
MaxClients 2
for easier tracking. You need at least two servers to compare the printouts of the test
program. Having more than two can make the comparison process harder.
Now open two browser windows and issue requests for this script in each window,
so that you get different PIDs reported in the two windows and so that each process
has processed a different number of requests for the share_test.pl script.
In the first window you will see something like this:
PID: 27040
pos: 1
SV = PVMG(0x853db20) at 0x8250e8c
REFCNT = 3
FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK)
IV = 0
NV = 0
PV = 0x8271af0 "Chris"\0
CUR = 5
LEN = 6
MAGIC = 0x853dd80
MG_VIRTUAL = &vtbl_mglob
MG_TYPE = 'g'
MG_LEN = 1
And in the second window:
PID: 27041
pos: 2
SV = PVMG(0x853db20) at 0x8250e8c
REFCNT = 3
FLAGS = (PADBUSY,PADMY,SMG,POK,pPOK)
IV = 0
NV = 0
PV = 0x8271af0 "Chris"\0
CUR = 5
LEN = 6
MAGIC = 0x853dd80
MG_VIRTUAL = &vtbl_mglob
MG_TYPE = 'g'
MG_LEN = 2
All the addresses of the supposedly large data structure are the same (0x8250e8c and
0x8271af0)—therefore, the variable data structure is almost completely shared. The
Example 10-2. share_test.pl
use Book::MyShared;
print "Content-type: text/plain\n\n";
print "PID: $$\n";
Book::MyShared::match( );
Book::MyShared::print_pos( );
Book::MyShared::dump( );
,ch10.23775 Page 354 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Sharing Memory
|
355
only difference is in the SV.MAGIC.MG_LEN record, which is not shared. This record is
used to track where the last
m//g match left off for the given variable, (e.g., by pos( ))
and therefore it cannot be shared. See the perlre manpage for more information.
Given that the
$readonly variable is a big one, its value is still shared between the
processes, while part of the variable data structure is nonshared. The nonshared part
is almost insignificant because it takes up very little memory space.
If you need to compare more than one variable, doing it by hand can be quite time
consuming and error prone. Therefore, it’s better to change the test script to dump
the Perl datatypes into files (e.g., /tmp/dump.$$, where
$$ is the PID of the process).
Then you can use the diff(1) utility to see whether there is some difference.
Changing the
dump( ) function to write the information to a file will do the job.
Notice that we use
Devel::Peek and not Apache::Peek, so we can easily reroute the
STDERR stream into a file. In our example, when Devel::Peek tries to print to STDERR,it
actually prints to our file. When we are done, we make sure to restore the original
STDERR file handle.
The resulting code is shown in Example 10-3.
Now we modify our script to use the modified module, as shown in Example 10-4.
Example 10-3. Book/MyShared2.pm
package Book::MyShared2;
use Devel::Peek;
my $readonly = "Chris";
sub match { $readonly =~ /\w/g; }
sub print_pos { print "pos: ",pos($readonly),"\n";}
sub dump {
my $dump_file = "/tmp/dump.$$";
print "Dumping the data into $dump_file\n";
open OLDERR, ">&STDERR";
open STDERR, ">$dump_file" or die "Can't open $dump_file: $!";
Dump($readonly);
close STDERR ;
open STDERR, ">&OLDERR";
}
1;
Example 10-4. share_test2.pl
use Book::MyShared2;
print "Content-type: text/plain\n\n";
print "PID: $$\n";
Book::MyShared2::match( );
Book::MyShared2::print_pos( );
Book::MyShared2::dump( );
,ch10.23775 Page 355 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
356
|
Chapter 10: Improving Performance with Shared Memory and Proper Forking
Now we can run the script as before (with MaxClients 2). Two dump files will be
created in the directory /tmp. In our test these were created as /tmp/dump.1224 and
/tmp/dump.1225. When we run diff(1):
panic% diff -u /tmp/dump.1224 /tmp/dump.1225
12c12
- MG_LEN = 1
+ MG_LEN = 2
we see that the two padlists (of the variable $readonly) are different, as we observed
before, when we did a manual comparison.
If we think about these results again, we come to the conclusion that there is no need
for two processes to find out whether the variable gets modified (and therefore
unshared). It’s enough just to check the data structure twice, before the script was
executed and again afterward. We can modify the
Book::MyShared2 module to dump
the padlists into a different file after each invocation and then to run diff(1) on the
two files.
Suppose you have some lexically scoped variables (i.e., variables declared with
my())
in an
Apache::Registry script. If you want to watch whether they get changed
between invocations inside one particular process, you can use the
Apache::
RegistryLexInfo
module. It does exactly that: it takes a snapshot of the padlist before
and after the code execution and shows the difference between the two. This particu-
lar module was written to work with
Apache::Registry scripts, so it won’t work for
loaded modules. Use the technique we described above for any type of variables in
modules and scripts.
Another way of ensuring that a scalar is read-only and therefore shareable is to use
either the
constant pragma or the readonly pragma, as shown in Example 10-5. But
then you won’t be able to make calls that alter the variable even a little, such as in
the example that we just showed, because it will be a true constant variable and you
will get a compile-time error if you try this.
However, the code shown in Example 10-6 is OK.
Example 10-5. Book/Constant.pm
package Book::Constant;
use constant readonly => "Chris";
sub match { readonly =~ /\w/g; }
sub print_pos { print "pos: ",pos(readonly),"\n";}
1;
panic% perl -c Book/Constant.pm
Can't modify constant item in match position at Book/Constant.pm
line 5, near "readonly)"
Book/Constant.pm had compilation errors.
,ch10.23775 Page 356 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Sharing Memory
|
357
It doesn’t modify the variable flags at all.
Numerical versus string access to variables
Data can get unshared on read as well—for example, when a numerical variable is
accessed as a string. Example 10-7 shows some code that proves this.
Example 10-6. Book/Constant1.pm
package Book::Constant1;
use constant readonly => "Chris";
sub match { readonly =~ /\w/g; }
1;
Example 10-7. numerical_vs_string.pl
#!/usr/bin/perl -w
use Devel::Peek;
my $numerical = 10;
my $string = "10";
$|=1;
dump_numerical( );
read_numerical_as_numerical( );
dump_numerical( );
read_numerical_as_string( );
dump_numerical( );
dump_string( );
read_string_as_numerical( );
dump_string( );
read_string_as_string( );
dump_string( );
sub read_numerical_as_numerical {
print "\nReading numerical as numerical: ", int($numerical), "\n";
}
sub read_numerical_as_string {
print "\nReading numerical as string: ", "$numerical", "\n";
}
sub read_string_as_numerical {
print "\nReading string as numerical: ", int($string), "\n";
}
sub read_string_as_string {
print "\nReading string as string: ", "$string", "\n";
}
sub dump_numerical {
print "\nDumping a numerical variable\n";
Dump($numerical);
}
sub dump_string {
print "\nDumping a string variable\n";
,ch10.23775 Page 357 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
358
|
Chapter 10: Improving Performance with Shared Memory and Proper Forking
The test script defines two lexical variables: a number and a string. Perl doesn’t have
strong data types like C does; Perl’s scalar variables can be accessed as strings and
numbers, and Perl will try to return the equivalent numerical value of the string if it
is accessed as a number, and vice versa. The initial internal representation is based
on the initially assigned value: a numerical value
*
in the case of $numerical and a
string value
†
in the case of $string.
The script accesses
$numerical as a number and then as a string. The internal repre-
sentation is printed before and after each access. The same test is performed with a
variable that was initially defined as a string (
$string).
When we run the script, we get the following output:
Dumping a numerical variable
SV = IV(0x80e74c0) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,pIOK)
IV = 10
Reading numerical as numerical: 10
Dumping a numerical variable
SV = PVNV(0x810f960) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,NOK,pIOK,pNOK)
IV = 10
NV = 10
PV = 0
Reading numerical as string: 10
Dumping a numerical variable
SV = PVNV(0x810f960) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,NOK,POK,pIOK,pNOK,pPOK)
IV = 10
NV = 10
PV = 0x80e78b0 "10"\0
CUR = 2
LEN = 28
Dumping a string variable
SV = PV(0x80cb87c) at 0x80e8190
Dump($string);
}
* IV, for signed integer value, or a few other possible types for floating-point and unsigned integer
representations.
† PV, for pointer value (SV is already taken by a scalar data type)
Example 10-7. numerical_vs_string.pl (continued)
,ch10.23775 Page 358 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Sharing Memory
|
359
REFCNT = 4
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3
Reading string as numerical: 10
Dumping a string variable
SV = PVNV(0x80e78d0) at 0x80e8190
REFCNT = 4
FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK)
IV = 0
NV = 10
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3
Reading string as string: 10
Dumping a string variable
SV = PVNV(0x80e78d0) at 0x80e8190
REFCNT = 4
FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK)
IV = 0
NV = 10
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3
We know that Perl does the conversion from one type to another on the fly, and
that’s where the variables get modified—during the automatic conversion behind the
scenes. From this simple test you can see that variables may change internally when
accessed in different contexts. Notice that even when a numerical variable is accessed
as a number for the first time, its internals change, as Perl has intialized its
PV and NV
fields (the string and floating-point represenations) and adjusted the FLAGS fields.
From this example you can clearly see that if you want your variables to stay shared
and there is a chance that the same variable will be accessed both as a string and as a
numerical value, you have to access this variable as a numerical and as a string, as in
the above example, before the fork happens (e.g., in the startup file). This ensures
that the variable will be shared if no one modifies its value. Of course, if some other
variable in the same page happens to change its value, the page will become
unshared anyway.
Preloading Perl Modules at Server Startup
As we just explained, to get the code-sharing effect, you should preload the code
before the child processes get spawned. The right place to preload modules is at
server startup.
,ch10.23775 Page 359 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
360
|
Chapter 10: Improving Performance with Shared Memory and Proper Forking
You can use the PerlRequire and PerlModule directives to load commonly used mod-
ules such as
CGI.pm and DBI when the server is started. On most systems, server chil-
dren will be able to share the code space used by these modules. Just add the
following directives into httpd.conf:
PerlModule CGI
PerlModule DBI
An even better approach is as follows. First, create a separate startup file. In this file
you code in plain Perl, loading modules like this:
use DBI ( );
use Carp ( );
1;
(When a module is loaded, it may export symbols to your package namespace by
default. The empty parentheses
() after a module’s name prevent this. Don’t forget
this, unless you need some of these in the startup file, which is unlikely. It will save
you a few more kilobytes of memory.)
Next,
require( ) this startup file in httpd.conf with the PerlRequire directive, placing
the directive before all the other mod_perl configuration directives:
PerlRequire /path/to/startup.pl
As usual, we provide some numbers to prove the theory. Let’s conduct a memory-
usage test to prove that preloading reduces memory requirements.
To simplify the measurement, we will use only one child process. We will use these
settings in httpd.conf:
MinSpareServers 1
MaxSpareServers 1
StartServers 1
MaxClients 1
MaxRequestsPerChild 100
We are going to use memuse.pl (shown in Example 10-8), an Apache::Registry script
that consists of two parts: the first one loads a bunch of modules (most of which
aren’t going to be used); the second reports the memory size and the shared memory
size used by the single child process that we start, and the difference between the
two, which is the amount of unshared memory.
Example 10-8. memuse.pl
use strict;
use CGI ( );
use DB_File ( );
use LWP::UserAgent ( );
use Storable ( );
use DBI ( );
use GTop ( );
,ch10.23775 Page 360 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Sharing Memory
|
361
First we restart the server and execute this CGI script with none of the above mod-
ules preloaded. Here is the result:
Size Shared Unshared
4706304 2134016 2572288 (bytes)
Now we take the following code:
use strict;
use CGI ( );
use DB_File ( );
use LWP::UserAgent ( );
use Storable ( );
use DBI ( );
use GTop ( );
1;
and copy it into the startup.pl file. The script remains unchanged. We restart the
server (now the modules are preloaded) and execute it again. We get the following
results:
Size Shared Unshared
4710400 3997696 712704 (bytes)
Let’s put the two results into one table:
Preloading Size Shared Unshared
Yes 4710400 3997696 712704 (bytes)
No 4706304 2134016 2572288 (bytes)
Difference 4096 1863680 -1859584
You can clearly see that when the modules weren’t preloaded, the amount of shared
memory was about 1,864 KB smaller than in the case where the modules were
preloaded.
Assuming that you have 256 MB dedicated to the web server, if you didn’t preload
the modules, you could have 103 servers:
268435456 = X * 2572288 + 2134016
X = (268435456 - 2134016) / 2572288 = 103
(Here we have used the formula that we devised earlier in this chapter.)
my $r = shift;
$r->send_http_header('text/plain');
my $proc_mem = GTop->new->proc_mem($$);
my $size = $proc_mem->size;
my $share = $proc_mem->share;
my $diff = $size - $share;
printf "%10s %10s %10s\n", qw(Size Shared Unshared);
printf "%10d %10d %10d (bytes)\n", $size, $share, $diff;
Example 10-8. memuse.pl (continued)
,ch10.23775 Page 361 Thursday, November 18, 2004 12:40 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
362
|
Chapter 10: Improving Performance with Shared Memory and Proper Forking
Now let’s calculate the same thing with the modules preloaded:
268435456 = X * 712704 + 3997696
X = (268435456 - 3997696) / 712704 = 371
You can have almost four times as many servers!!!
Remember, however, that memory pages get dirty, and the amount of shared mem-
ory gets smaller with time. We have presented the ideal case, where the shared mem-
ory stays intact. Therefore, in use, the real numbers will be a little bit different.
Since you will use different modules and different code, obviously in your case it’s
possible that the process sizes will be bigger and the shared memory smaller, and
vice versa. You probably won’t get the same ratio we did, but the example certainly
shows the possibilities.
Preloading Registry Scripts at Server Startup
Suppose you find yourself stuck with self-contained Perl CGI scripts (i.e., all the code
placed in the CGI script itself). You would like to preload modules to benefit from
sharing the code between the children, but you can’t or don’t want to move most of
the stuff into modules. What can you do?
Luckily, you can preload scripts as well. This time the
Apache::RegistryLoader mod-
ule comes to your aid.
Apache::RegistryLoader compiles Apache::Registry scripts at
server startup.
For example, to preload the script /perl/test.pl, which is in fact the file /home/httpd/
perl/test.pl, you would do the following:
use Apache::RegistryLoader ( );
Apache::RegistryLoader->new->handler("/perl/test.pl",
"/home/httpd/perl/test.pl");
You should put this code either in <Perl> sections or in a startup script.
But what if you have a bunch of scripts located under the same directory and you
don’t want to list them one by one? Then the
File::Find module will do most of the
work for you.
The script shown in Example 10-9 walks the directory tree under which all
Apache::
Registry
scripts are located. For each file with the extension .pl, it calls the Apache::
RegistryLoader::handler( )
method to preload the script in the parent server. This
happens before Apache pre-forks the child processes.
Example 10-9. startup_preload.pl
use File::Find qw(finddepth);
use Apache::RegistryLoader ( );
{
my $scripts_root_dir = "/home/httpd/perl/";
,ch10.23775 Page 362 Thursday, November 18, 2004 12:40 PM
Không có nhận xét nào:
Đăng nhận xét