通过phpfpm日志分析请求是CPU/IO密集型

在LNMP架构中,通常都会开启nginx日志记录$upstream_response_time请求响应时间,而忽略到phpfpm日志可以提供更详尽的分析:

; The access log file
; Default: not set
;access.log = log/$pool.access.log

; The access log format.
; The following syntax is allowed
;  %%: the '%' character
;  %C: %CPU used by the request
;      it can accept the following format:
;      - %{user}C for user CPU only
;      - %{system}C for system CPU only
;      - %{total}C  for user + system CPU (default)
;  %d: time taken to serve the request
;      it can accept the following format:
;      - %{seconds}d (default)
;      - %{miliseconds}d
;      - %{mili}d
;      - %{microseconds}d
;      - %{micro}d
;  %e: an environment variable (same as $_ENV or $_SERVER)
;      it must be associated with embraces to specify the name of the env
;      variable. Some exemples:
;      - server specifics like: %{REQUEST_METHOD}e or %{SERVER_PROTOCOL}e
;      - HTTP headers like: %{HTTP_HOST}e or %{HTTP_USER_AGENT}e
;  %f: script filename
;  %l: content-length of the request (for POST request only)
;  %m: request method
;  %M: peak of memory allocated by PHP
;      it can accept the following format:
;      - %{bytes}M (default)
;      - %{kilobytes}M
;      - %{kilo}M
;      - %{megabytes}M
;      - %{mega}M
;  %n: pool name
;  %o: output header
;      it must be associated with embraces to specify the name of the header:
;      - %{Content-Type}o
;      - %{X-Powered-By}o
;      - %{Transfert-Encoding}o
;      - ....
;  %p: PID of the child that serviced the request
;  %P: PID of the parent of the child that serviced the request
;  %q: the query string
;  %Q: the '?' character if query string exists
;  %r: the request URI (without the query string, see %q and %Q)
;  %R: remote IP address
;  %s: status (response code)
;  %t: server time the request was received
;      it can accept a strftime(3) format:
;      %d/%b/%Y:%H:%M:%S %z (default)
;  %T: time the log has been written (the request has finished)
;      it can accept a strftime(3) format:
;      %d/%b/%Y:%H:%M:%S %z (default)
;  %u: remote user
;
; Default: "%R - %u %t \"%m %r\" %s"
;access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %C%%"

划重点:
%d: 响应时间
%C: CPU处理时间占响应时间的百分比
%M: 峰值内存占用

通常一个请求,我们可以轻易的知道他的总响应时长,但是这个总时长里多少是CPU,多少是iowait是不知道的,有如下测试脚本

// test.php
// cpu
for($i=0; $i<10000000;$i++) {
  pow(2,32);
}
// iowait
$timeout = stream_context_create(['http' => ['timeout'=> 1]]);
file_get_contents('http://www.google.com/', 0, $timeout);

在shell环境,我们可以使用time命令获得进程的cpu时间占用

time php test.php
real    0m1.751s
user    0m0.724s
sys     0m0.024s

iowait时长可以用总时间相减得到:iowait = real – user + sys
上例中iowait = 1.751-0.724-0.024 = 1.003s 和我们设置的timeout几乎一致
实际上,如果dns未缓存,这里的iowait时长还会额外增加dns查询的实际,一定大于1秒

如此一来,通过记录%d和%C就能帮助方便我们轻松区分cpu和iowait接口。

启用日志:

access.log = /var/log/$pool.access.log
access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %C%%"
curl http://localhost/test.php

tail /var/log/www.access.log
127.0.0.1 -  29/Oct/2017:23:39:34 +0800 "GET /test.php" 200 /index.php 2649.988 2048 40.00%

这条日志记录了test.php总时长为2649.988毫秒,其中仅有40%的时长是cpu消耗的,那么这个请求的%iowait就是60%

需要注意的是,%CPU有可能大于100%,这是为什么呢?
查看php源代码:
sapi/fpm/fpm/fpm_request.c
在每次请求开始(fpm_request_reading_headers)和结束(fpm_request_end),使用times获得当前进程cpu时间,tms结构体如下

struct tms {
   clock_t tms_utime;  /* user time */
   clock_t tms_stime;  /* system time */
   clock_t tms_cutime; /* user time of children */
   clock_t tms_cstime; /* system time of children */
};

最后写入日志sapi/fpm/fpm/fpm_log.c,cpu时长是包含子进程时长的
如果想获得主进程的iowait时间,应该是:总时长-用户空间时间-系统空间时间(不包括子进程时间)
可以给sapi/fpm/fpm/fpm_log.c打一个patch,获得原始的cpu时间

#ifdef HAVE_TIMES
case 'C': /* %CPU */
    if (format[0] == '\0' || !strcasecmp(format, "total")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_utime + proc.last_request_cpu.tms_stime + proc.last_request_cpu.tms_cutime + proc.last_request_cpu.tms_cstime;
        }
    } else if (!strcasecmp(format, "user")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_utime + proc.last_request_cpu.tms_cutime;
        }
    } else if (!strcasecmp(format, "system")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_stime + proc.last_request_cpu.tms_cstime;
        }
    } else if (!strcasecmp(format, "_utime")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_utime;
        }
    } else if (!strcasecmp(format, "_cutime")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_cutime;
        }
    } else if (!strcasecmp(format, "_stime")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_stime;
        }
    } else if (!strcasecmp(format, "_cstime")) {
        if (!test) {
            tms_total = proc.last_request_cpu.tms_cstime;
        }
    } else {
        zlog(ZLOG_WARNING, "only 'total', 'user' or 'system' are allowed as a modifier for %%%c ('%s')", *s, format);
        return -1;
    }

    if (!test) {
        if ( format[0] == '_' ) {
            len2 = snprintf(b, FPM_LOG_BUFFER - len, "%f", tms_total / fpm_scoreboard_get_tick() * 1000.);
        } else {
            len2 = snprintf(b, FPM_LOG_BUFFER - len, "%.2f", tms_total / fpm_scoreboard_get_tick() / (proc.cpu_duration.tv_sec + proc.cpu_duration.tv_usec / 1000000.) * 100.);
        }
    }
    format[0] = '\0';
    break;
#endif

发表评论