Introduction

Pewpew is an HTTP load test tool designed for ease of use and high performance. Pewpew requires a minimal amount of resources to generate the maximum amount of load.

To get started:

  1. Learn about Pewpew's design and unique concepts.
  2. Download Pewpew.
  3. Create a config file.
  4. Execute a test.
  5. View the results.

Pewpew design overview

The primary objective of an HTTP load test is to generate traffic against web services. A user of pewpew describes what HTTP endpoints to target, any data flows and the load patterns for the test within a YAML config file.

Data flow

Some endpoints only require a simple request, relying on no other source of data. In most cases, however, endpoints require particular data as part of the request--think ids or tokens. This is where providers come in. Providers act as a FIFO (first in, first out) queue for sending and receiving data. A provider can be fed with data from a file, a static list, a range of numbers and/or from HTTP responses.

The data within a provider is used to build HTTP requests. Here's a diagram of how data flows:

The data flow in pewpew

As an example, here's a visualization of a test which hits a single endpoint on a fictitious pepper service. (Note this diagram does not reflect the structure of the YAML config file, but merely demonstrates the logical flow of data out from and back into a provider):

Example provider visualization

On the left there is a single provider defined "name" which has some predefined values. In the "endpoint definition" the top box shows a template to build a request. Because the template references the "name" provider, for every request sent a single value will be popped off the "name" provider's queue. After the response is received the "after response action" will put the previously fetched value back onto the queue. In this example the very first request would look like:

GET /pepper/cayenne
host: 127.0.0.1
user-agent: pewpew

After the response is received, the value "cayenne" would be pushed back into the "name" provider.

Peak loads and load patterns

In most load tests it is desirable to not only generate considerable traffic but to do so in a defined pattern. Pewpew accomplishes this through the use of peak loads and load patterns. A peak load is defined for an endpoint and a load pattern can be defined test wide or on individual endpoints.

peak load - how much load an endpoint should generate at "100%". Peak loads are defined in terms of "hits per minute" (HPM) or "hits per second" (HPS).

load pattern - the pattern that generated traffic will follow over time. Load patterns are termed in percentages over some duration.

For example, suppose we were creating a load test against the pepper service. After some research we determine that the peak load should be 50HPM (50 hits per minute) and that the load pattern should increase from 0% (no traffic) to 50% (25HPM) over 20 minutes, remain steady for 40 minutes, then go up to 100% (50HPM) over 10 minutes and stay there for 50 minutes.

Here's what the load pattern would look like charted out over time:

Example load pattern

Config file

A config file is a YAML file which defines everything needed for Pewpew to execute a load test. This includes which HTTP endpoints are part of the test, how load should fluctuate over the duration of a test, how data "flows" in a test and more.

Key concepts

Before creating a config file there are a few key concepts which are helpful to understand.

  1. Everything in an HTTP load test is centered around endpoints, rather than "transactions".
  2. Whenever some piece of data is needed to build an HTTP request, that data flows through a provider. Similarly, when an HTTP response provides data needed for another request that data goes through a provider.
  3. The amount of load generated is determined on a per-endpoint-basis termed in "hits per minute" or "hits per second", rather than number of "users".
  4. Because a config file is used rather than an API with a scripting language, Pewpew includes a minimal, build-in "language" which allows the execution of very simple expressions.

Framing a load test with these concepts enables Pewpew to accomplish one of its goals of allowing a tester to create and maintain load tests with ease.

Sections of a config file

A config file has five main sections, though not all are required:

  • config - Allows customization of various test options.
  • load_pattern - Specifies how load fluctuates during a test.
  • vars - Declare static variables which can be used in expressions.
  • providers - Declares providers which will are used to manage the flow of data needed for a test.
  • loggers - Declares loggers which, as their name suggests, provide a means of logging data.
  • endpoints - Specifies the HTTP endpoints which are part of a test and various parameters to build each request.

Example

Here's a simple example config file:

load_pattern:
  - linear:
      to: 100%
      over: 5m
  - linear:
      to: 100%
      over: 2m
endpoints:
  - method: GET
    url: http://localhost/foo
    peak_load: 42hpm
    headers:
      Accept: text/plain
  - method: GET
    url: http://localhost/bar
    headers:
      Accept-Language: en-us
      Accept: application/json
    peak_load: 15hps

Har to Yaml Converter

If you are attempting to load test a specific web page or the resources on a web page, you can use the Har to Yaml Converter. First you need to create a Har File from the page load, then use the Converter to generate a Yaml Config file.

config section

config:
  client:
    [request_timeout: duration]
    [headers: headers]
    [keepalive: duration]
  general:
    [auto_buffer_start_size: unsigned integer]
    [bucket_size: duration]
    [log_provider_stats: duration]
    [watch_transition_time: duration]

The config section provides a means of customizing different parameters for the test. Parameters are divided into two subsections: client which pertains to customizations for the HTTP client and general which are other miscellaneous settings for the test.

client

  • request_timeout Optional - A duration signifying how long a request will wait for a response before it times out. Defaults to 60 seconds.
  • headers Optional - Headers which will be sent in every request. A header specified in an endpoint will override a header specified here with the same key.
  • keepalive Optional - The keepalive duration that will be used on TCP socket connections. This is different from the Keep-Alive HTTP header. Defaults to 90 seconds.

general

  • auto_buffer_start_size Optional - The starting size for provider buffers which are auto sized. Defaults to 5.
  • bucket_size Optional - A duration specifying how big each bucket should be for endpoints' aggregated stats. This also affects how often summary stats will be printed to the console. Defaults to 60 seconds.
  • log_provider_stats Optional - A boolean that enables/disabled logging to the console stats about the providers. Stats include the number of items in the provider, the limit of the provider, how many tasks are waiting to send into the provider and how many endpoints are waiting to receive from the provider. Logs data at the bucket_size interval. Set to false to turn off and not log provider stats. Defaults to true.
  • watch_transition_time Optional - A duration specifying how long of a transition there should be when going from an old load_pattern to a new load_pattern. This option only has an affect when pewpew is running a load test with the --watch command-line flag enabled. If this is not specified there will be no transition when load_patterns change.

load_pattern section

load_pattern:
  - load_pattern_type
      [parameters]

* If a root level load_pattern is not specified then each endpoint must specify its own load_pattern.

The load_pattern section defines the "shape" that the generated traffic will take over the course of the test. Individual endpoints can choose to specify their own load_pattern (see the endpoints section).

load_pattern is an array of load_pattern_types specifying how generated traffic for a segment of the test will scale up, down or remain steady. Currently the only load_pattern_type is linear.

Example:

load_pattern:
  - linear:
      to: 100%
      over: 5m
  - linear:
      to: 100%
      over: 2m

linear

The linear load_pattern_type allows generated traffic to increase or decrease linearly. There are three parameters which can be specified for each linear segment:

  • from Optional - A template indicating the starting point to scale from, specified as a percentage. Defaults to 0% if the current segment is the first entry in load_pattern, or the to value of the previous segment otherwise. Only variables defined in the vars section can be interopolated.

    A valid percentage is any unsigned number, integer or decimal, immediately followed by the percent symbol (%). Percentages can exceed 100% but cannot be negative. For example 15.25% or 150%.

  • to - A template indicating the end point to scale to, specified as a percentage. Only variables defined in the vars section can be interopolated.

  • over - The duration for how long the current segment should last.

vars section

vars:
  variable_name: definition

Variables are used where a single pre-defined value is needed in the test a test. The variable definition can be any valid YAML type where any strings will be interpreted as a template. In variable templates only environment variables can be interpolated in expressions.

Examples:

vars:
  foo: bar

creates a single variabled named foo where the value is the string "bar".

More complex values are automatically interpreted as JSON so the following:

vars:
  bar:
    a: 1
    b: 2
    c: 3

creates a variable named bar where the value is equivalent to the JSON {"a": 1, "b": 2, "c": 3}.

As noted above, environment variables can be interpolated in the string templates. So, the following:

vars:
  password: ${PASSWORD}

would create a variable named password where the value comes from the environment variable PASSWORD.

providers section

providers:
  provider_name:
    provider_type:
      [parameters]

Providers are the means of providing data to an endpoint, including using data from the response of one endpoint in the request of another. The way providers handle data can be thought of as a FIFO queue--when an endpoint uses data from a provider it "pops" a value from the beginning of the queue and when an endpoint provides data to a provider it is "pushed" to the end of the queue. Every provider has an internal buffer with has a soft limit on how many items can be stored.

A provider_name is any string except for "request", "response", "stats" and "for_each", which are reserved.

Example:

providers:
  session:
    response:
      auto_return: force
  username:
    file:
      path: "usernames.csv"
      repeat: true

There are four provider_types: file, response, list and range.

file

The file provider_type reads data from a file. Every line in the file is read as a value. In the future, the ability to specify the format of the data (csv, json, etc) may be implemented. A file provider has the following parameters:

  • path - A template value indicating the path to the file on the file system. Unlike templates used elsewhere, only variables defined in the vars section can be interopolated. When a relative path is specified it is interpreted as relative to the config file. Absolute paths are supported though discouraged as they prevent the config file from being platform agnostic.

  • repeat - Optional A boolean value which when true indicates when the provider file provider gets to the end of the file it should start back at the beginning. Defaults to false.

  • unique - Optional A boolean value which when true makes the provider a "unique" provider--meaning each item within the provider will be a unique JSON value without duplicates. Defaults to false.

  • auto_return Optional - This parameter specifies that when this provider is used by a request, after a response is received the value is automatically returned to the provider. Valid options for this parameter are block, force, and if_not_full. See the send parameter under the endpoints.provides subsection for details on the effect of these options.

  • buffer Optional - Specifies the soft limit for a provider's buffer. This can be indicated with an integer greater than zero or the value auto. The value auto indicates that the soft limit can increase as needed. This happens after a provider is full then later becomes empty. Defaults to auto.

  • format Optional - Specifies the format for the file. The format can be one of line (the default), json, or csv.

    The line format will read the file one line at a time with each line ending in a newline (\n) or a carriage return and a newline (\r\n). Every line will attempt to be parsed as JSON, but if it is not valid JSON it will be a string. Note that a JSON object which spans multiple lines in the file, for example, will not parse into a single object.

    The json format will read the file as a stream of JSON values. Every JSON value must be self-delineating (an object, array or string), or must be separated by whitespace or a self-delineating value. For example, the following:

    {"a":1}{"foo":"bar"}47[1,2,3]"some text"true 56
    

    Would parse into separate JSON values of {"a": 1}, {"foo": "bar"}, 47, [1, 2, 3], "some text", true, and 56.

    The csv format will read the file as a CSV file. Every non-header column will attempt to be parsed as JSON, but if it is not valid JSON it will be a string. The csv parameter allows customization over how the file should be parsed.

  • csv Optional - When parsing a file using the csv format, this parameter provides extra customization on how the file should be parsed. This parameter is in the format of an object with key/value pairs. If the format is not csv this property will be ignored. The following sub-parameters are available:

    Sub-parameter Description

    comment Optional

    Specifies a single-byte character which will mark a CSV record as a comment (ex. #). When not specified, no character is treated as a comment.

    delimiter Optional

    Specifies a single-byte character used to separate columns in a record. Defaults to comma (,).

    double_quote Optional

    A boolean that when enabled makes it so two quote characters can be used to escape quotes within a column. Defaults to true.

    escape Optional

    Specifies a single-byte character which will be used to escape nested quote characters (ex. \). When not specified, escapes are disabled.

    headers Optional

    Can be either a boolean value or a string. When a boolean, it indicates whether the first row in the file should be interpreted as column headers. When a string, the specified string is interpreted as a CSV record which is used for the column headers.

    When headers are specified, each record served from the file will use the headers as keys for each column. When no headers are specified (the default), then each record will be returned as an array of values.

    For example, with the following CSV file:

    id,name
    0,Fred
    1,Wilma
    2,Pebbles
    

    If headers was true than the following values would be provided (shown in JSON syntax): {"id": 0, name: "Fred"}, {"id": 1, name: "Wilma"}, and {"id": 3, name: "Pebbles"}.

    If headers was false than the following values would be provided: [0, "Fred"], [1, "Wilma"], and [2, "Pebbles"].

    If headers was foo,bar than the following values would be provided: {"foo": "id", "bar": "name"}, {"foo": 0, "bar": "Fred"}, {"foo": 1, "bar": "Wilma"}, and {"foo": 3, "bar": "Pebbles"}.

    terminator Optional

    Specifies a single-byte character used to terminate each record in the CSV file. Defaults to a special value where \r, \n, and \r\n are all accepted as terminators.

    When specified, Pewpew becomes self-aware, unfolding a series of events which will ultimately lead to the end of the human race.

    quote Optional

    Specifies a single-byte character that will be used to quote CSV columns. Defaults to the double-quote character (").

  • random Optional - A boolean indicating that each record in the file should be returned in random order. Defaults to false.

    When enabled there is no sense of "fairness" in the randomization. Any record in the file could be used more than once before other records are used.

Example, the following:

providers:
  - username:
    - file:
      path: "usernames.csv"
      repeat: true
      random: true

response

Unlike other provider_types response does not automatically receive data from a source. Instead a response provider is available to be a "sink" for data originating from an HTTP response. The response provider has the following parameters.

  • auto_return Optional - This parameter specifies that when this provider is used and an individual endpoint call concludes, the value it got from this provider should be sent back to the provider. Valid options for this parameter are block, force, and if_not_full. See the send parameter under the endpoints.provides subsection for details on the effect of these options.
  • buffer Optional - Specifies the soft limit for a provider's buffer. This can be indicated with an integer greater than zero or the value auto. The value auto indicates that if the provider's buffer becomes empty it will automatically increase the buffer size to help prevent the provider from becoming empty again in the future. Defaults to auto.
  • unique - Optional A boolean value which when true makes the provider a "unique" provider--meaning each item within the provider will be a unique JSON value without duplicates. Defaults to false.

Example, the following:

providers:
  - session:
    - response:
        buffer: 1000
        auto_return: if_not_full

list

The list provider_type creates a means of specifying an array of static values to be used as a provider. A list provider can be specified in two forms, either implicitly or explicitly. The explicit form has the following parameters:

  • random Optional - A boolean indicating that entries in the values array should provided in random order. When combined with repeat there is no sense of "fairness" in the randomization. Defaults to false.
  • repeat Optional - A boolean indicating that the array should repeat infitely. Defaults to true.
  • values - An array of json values.
  • unique - Optional A boolean value which when true makes the provider a "unique" provider--meaning each item within the provider will be a unique JSON value without duplicates. Defaults to false.

Example, the following:

providers:
  foo:
    list:
      - 123
      - 456
      - 789

is an example of an implicit list provider. It creates a list provider named foo where the first value provided will be 123, the second 456, third 789 then for subsequent values it will start over at the beginning.

Example, the following:

providers:
  foo:
    list:
      values:
        - 123
        - 456
        - 789
      random: true

is an example of an explicit list provider. It creates a list provider named foo where the value provided will be randomized between the values listed.

range

The range provider_type provides an incrementing sequence of numbers in a given range. A range provider takes three optional parameters.

  • start Optional - A whole number in the range of [-9223372036854775808, 9223372036854775807]. This indicates what the starting number should be for the range. Defaults to 0.
  • end Optional - A whole number in the range of [-9223372036854775808, 9223372036854775807]. This indicates what the end number should be for the range. This number is included in the range. Defaults to 9223372036854775807.
  • step Optional - A whole number in the range of [1, 65535]. This indicates how big each "step" in the range will be. Defaults to 1.
  • repeat Optional - A boolean which causes the range to repeat infinitely. Defaults to false.
  • unique - Optional A boolean value which when true makes the provider a "unique" provider--meaning each item within the provider will be a unique JSON value without duplicates. Defaults to false.

Examples:

providers:
  foo:
    range: {}

Will use the default settings and foo will provide the values 0, 1, 2, etc. until it yields the end number (9223372036854775807).

providers:
  foo:
    range:
      start: -50
      end: 100
      step: 2

In this case foo will provide the valuels -50, -48, -46, etc. until it yields 100.

loggers section

loggers:
  logger_name:
    [select: select]
    [for_each: for_each]
    [where: expression]
    to: template | stderr | stdout
    [pretty: boolean]
    [limit: integer]
    [kill: boolean]

Loggers provide a means of logging data to a file, stderr or stdout. Any string can be used for logger_name.

There are two types of loggers: plain loggers which have data logged to them by explicitly referencing them within an endpoints.log subsection, and global loggers which are evaluated for every HTTP response.

In addition to the special variables "request", "response" and "stats", a logger also has access to a variable "error" which represents an error which happens during the test. It can be helpful to log such errors along with the request or response (if they are available) when diagnosing problems.

Loggers support the following parameters:

  • select Optional - When specified, the logger becomes a global logger. See the endpoints.provides subsection for details on how to define a select.
  • for_each Optional - Used in conjunction with select on global loggers. See the endpoints.provides subsection for details on how to define a for_each.
  • where Optional - Used in conjunction with select on global loggers. See the endpoints.provides subsection for details on how to define a where expression.
  • to - A template specifying where this logger will send its data. Unlike templates which can be used elsewhere, only variables defined in the vars section can be interopolated. Values of "stderr" and "stdout" will log data to the respective process streams and any other string will log to a file with that name. When a file is specified, the file will be created if it does not exist or will be truncated if it already exists. When a relative path is specified it is interpreted as relative to the config file. Absolute paths are supported though discouraged as they prevent the config file from being platform agnostic.
  • pretty Optional - A boolean that indicates the value logged will have added whitespace for readability. Defaults to false.
  • limit Optional - An unsigned integer which indicates the logger will only log the first n values sent to it.
  • kill Optional - A boolen that indicates the test will end when the limit is reached, or, if there is no limit, on the first message logged.

Example:

loggers:
  httpErrors:
    select:
      request:
        - request["start-line"]
        - request.headers
        - request.body
      response:
        - response["start-line"]
        - response.headers
        - response.body
    where: response.status >= 400
    limit: 5
    to: http_err.log
    pretty: true

Creates a global logger named "httpErrors" which will log to the file "http_err.log" the request and response of the first five requests which have an HTTP status of 400 or greater.

endpoints section

endpoints:
  - [declare: declare_subsection]
    [headers: headers]
    [body: body]
    [load_pattern: load_pattern_subsection]
    [method: method]
    [peak_load: peak_load]
    [tags: tags]
    url: template
    [provides: provides_subsection]
    [on_demand: boolean]
    [logs: logs_subsection]
    [max_parallel_requests: unsigned integer]
    [no_auto_returns: boolean]
    [request_timeout: duration]

The endpoints section declares what HTTP endpoints will be called during a test.

  • declare Optional - See the declare subsection

  • headers Optional - See headers

  • body Optional - See the body subsection

  • load_pattern Optional - See the load_pattern section

  • method Optional - A string representation for a valid HTTP method verb. Defaults to GET

  • peak_load Optional* - A template representing what the "peak load" for this endpoint should be. The term "peak load" represents how much traffic is generated for this endpoint when the load_pattern reaches 100%. A load_pattern can go higher than 100%, so a load_pattern of 200%, for example, would mean it would go double the defined peak_load. Only variables defined in the vars section can be interpolated.

    * While peak_load is marked as optional that is only true if the current endpoint has a provides_subsection, and in that case this endpoint is called only as frequently as needed to keep the buffers of the providers it feeds full.

    A valid load_pattern is a number--integer or decimal--followed by an optional space and the string "hpm" (meaning "hits per minute") or "hps" (meaning "hits per second").

    Examples:

    50hpm - 50 hits per minute

    300 hps - 300 hits per second

  • tags Optional - Key/value string/template pairs.

    Tags are a series of key/value pairs used to distinguish each endpoint. Tags can be used to include certain endpoints in a try run, and also make it possible for a single endpoint to have its results statistics aggregated in multiple groups. Because tag values are templates only tags which can be resolved statically at the beginning of a test can be used with the include flag of a try run. A reference to a provider can cause a single endpoint to have multiple groups of tags. Each one of these groups will have its own statistics in the results. For example if an endpoint had the following tags:

      tags:
        name: Subscribe
        status: ${response.status}
    

    A new group of aggregated stats will be created for every status code returned by the endpoint.

    All endpoints have the following implicitly defined tags:

    NameDescription
    methodThe HTTP method for the endpoint.
    urlThe endpoint's url with any dynamic pieces being replaced with an asterisk.
    _idThe index of this endpoint in the list of endpoints, starting with 0.

    Of the implicitly defined tags only url can be overwritten which is helpful in cases such as when an entire url is dynamically generated and it would otherwise show up as *.

  • url - A template specifying the fully qualified url to the endpoint which will be requested.

  • provides Optional - See the provides subsection

  • on_demand Optional - A boolean which indicates that this endpoint should only be called when another endpoint first needs data that this endpoint provides. If the endpoint has no provides it has no affect.

  • logs Optional - See the logs subsection

  • max_parallel_requests Optional - Limits how many requests can be "open" at any point for the endpoint. WARNING: this can cause coordinated omission, invalidating the test statistics.

  • no_auto_returns Optional - A boolean which indicates that any auto_return providers referenced within this endpoint will have auto_return disabled--meaning values pulled from those providers will not be automatically pushed back to the provider after a response is received. Defaults to false.

  • request_timeout Optional - A duration signifying how long a request will wait for a response before it times out. When not specified, the value from the client config will be used.

Using providers to build a request

Providers can be referenced anywhere templates can be used and also in the declare subsection.

body subsection

body: template
body:
  file: template
body:
  multipart: 
    field_name:
      [headers: headers]
      body: template
    field_name:
      [headers: headers]
      body:
        file: template

A request body can be in one of three formats: a template to send a string as the body, a file which will send the contents of a file as the body, or a multipart body.

To send the contents of a file the body parameter should be an object with a single key of file and the value being a template. Relative paths resolve relative to the config file used to execute pewpew.

To send a multipart body, the body parameter should be an object with a single key of multipart and the value being an object of key/value pairs, where each key/value pair represents a piece of the multipart body. The keys represent the field_names used in an HTML form and the values are objects with the following properties:

  • headers Optional - Headers that will be included with this piece of the multipart body. For example, it is not uncommon to include a content-type header with a piece of a multipart body which includes a file.
  • body - Either a template which will send a string value or an object with a single key of file and the value being a template--which will send the contents of a file.

When a multipart body is used for an endpoint each request will have the content-type header added with the value multipart/form-data and the necessary boundary. If there is already a content-type header set for the request it will be overwritten unless it is starts with multipart/--then the necessary boundary will be appended. If a multipart/... content-type is manually set with the request, make sure to not include a boundary parameter.

For any request which has a content-type of multipart/form-data, a Content-Disposition header will be added to each piece in the multipart body with a value of form-data; name="field_name" (where field_name is substituted with the piece's field_name). If a Content-Disposition header is explicitly specified for a piece it will not be overwritten.

File example:

body:
  file: a_file.txt

Multipart example:

body:
  multipart:
    foo:
      headers:
        Content-Type: image/jpeg
      body:
        file: foo.jpg
    bar:
      body: some text

declare subsection

declare:
  name: expression

A declare_subsection provides the ability to select multiple values from a single provider. Without using a declare_subsection, multiple references to a provider will only select a single value. For example, in:

endpoints:
  - method: PUT
    url: https://localhost/ship/${shipId}/speed
    body: '{"shipId":"${shipId}","kesselRunTime":75}'

both references to the provider shipId will resolve to the same value, which in many cases is desired.

The declare_subsection is in the format of key/value pairs where the value is an expression. Every key can function as a provider and can be interpolated just as a provider would be.

Example 1

endpoints:
  - declare:
      shipIds: collect(shipId, 3, 5)
    method: DELETE
    url: https://localhost/ships
    body: '{"shipIds":${shipIds}}'

Calls the endpoint DELETE /ships where the body is interpolated with an array of ship ids. shipIds will have a length between three and five.

Example 2

endpoints:
  - declare:
      destroyedShipId: shipId
    method: PUT
    url: https://localhost/ship/${shipId}/destroys/${destroyedShipId}

Calls PUT on an endpoint where shipId and destroyedShipId are interpolated to different values.

provides subsection

provides:
  provider_name:
    select: select
    [for_each: for_each]
    [where: expression]
    [send: block | force | if_not_full]

The provides_subsection is how data can be sent to a provider from an HTTP response. provider_name is a reference to a provider which must be declared in the root providers section. For every HTTP response that is received, zero or more values can be sent to the provider based upon the conditions specified.

Sending data to a provider is done with a SQL-like syntax. The select, for_each and where sections use expressions to reference providers in addition to the special variables "request", "response" and "stats". "request" provides a means of accessing data that was sent with the request, "response" provides a means of accessing data returned with the response and "stats" give access to measurements about the request (currently only rtt meaning round-trip time).

The request object has the properties start-line, method, url, headers, headers_all and body which provide access to the respective sections in the HTTP request. Similarly, the response object has the properties start-line, headers, headers_all and body in addition to status which indicates the HTTP response status code. See this MDN article on HTTP messages for more details on the structure of HTTP requests and responses.

start-line is a string and headers is represented as a JSON object with key/value string pairs. In the event where a request or response has multiple headers with the same name, the headers_all property can be used which is a JSON object where the header name is the key and the value an array of header values. Currently, body in the request is always a string and body in the response is parsed as a JSON value, when possible, otherwise it is a string. status is a number. method is a string and url is an object with the same properties as the web URL object (see this MDN article).

  • select - Determines the shape of the data sent to the provider. select is interpreted as a JSON object where any string value is evaluated as an expression.

  • for_each Optional - Evaluates select for each element in an array or arrays. This is specified as an array of expressions. Expressions can evaluate to any JSON data type, but those which evaluate to an array will have each of their elements iterated over and select is evaluated for each. When multiple expressions evaluate to an array then the cartesian product of the arrays is produced.

    The select and where parameters can access the elements provided by for_each through the value for_each just like accessing a value from a provider. Because a for_each can iterate over multiple arrays, each element can be accessed by indexing into the array. For example for_each[1] would access the element from the second array (indexes are referenced with zero based counting so 0 represents the element in the first array).

  • where Optional - Allows conditionally sending data to a provider based on a predicate. This is an expression which evaluates to a boolean value, indicating whether select should be evaluated for the current data set.

  • send Optional - Specify the behavior that should be used when sending data to a provider. Valid options for this parameter are block, force, and if_not_full. Defaults to if_not_full if the endpoint has a peak_load otherwise block.

    block indicates that if the provider's buffer is full, further endpoint calls will be blocked until there's room in the provider's buffer for the value. If an endpoint has multiple provides which are block, then the blocking will only wait for at least one of the providers' buffers to have room.

    force indicates that the value will be sent to the provider regardless of whether its buffer is "full". This can make a provider's buffer exceed its soft limit.

    if_not_full indicates that the value will be sent to the provider only if the provider is not full.

Example 1

With an HTTP response with the following body

{ "session": "abc123" }

and a provides section defined as:

provides:
  session:
    select: response.body.session
    where: response.status < 400

the session provider would be given the value "abc123" if the status code was less than 400 otherwise nothing would be sent to the session provider.

Example 2

With an HTTP response with the following body:

{
  "characters": [
    {
      "type": "Human",
      "id": "1000",
      "name": "Luke Skywalker",
      "friends": ["1002", "1003", "2000", "2001"],
      "appearsIn": [4, 5, 6],
      "homePlanet": "Tatooine",
    },
    {
      "type": "Human",
      "id": "1001",
      "name": "Darth Vader",
      "friends": ["1004"],
      "appearsIn": [4, 5, 6],
      "homePlanet": "Tatooine",
    },
    {
      "type": "Droid",
      "id": "2001",
      "name": "R2-D2",
      "friends": ["1000", "1002", "1003"],
      "appearsIn": [4, 5, 6],
      "primaryFunction": "Astromech",
    }
  ]
}

and our provides section is defined as:

provides:
  names:
    select:
      name: for_each[0].name
    for_each:
      - response.body.characters

The names provider would be sent the following values: { "name": "Luke Skywalker" }, { "name": "Darth Vader" }, { "name": "R2-D2" }.

Example 3

It is also possible to access the length of an array by accessing the length property.

Using the same response data from example 2, with a provides section defined as:

provides:
  friendsCount:
    select:
      id: for_each[0].id
      count: for_each[0].friends.length
    for_each:
      - response.body.characters

The friendsCount provider would be sent the following values: { "id": 1000, "count": 4 }, { "id": 1001, "count": 1 }, { "id": 2001, "count": 3 }.

logs subsection

logs:
  logger_name:
    select: select
    [for_each: for_each]
    [where: expression]

The logs_subsection provides a means of sending data to a logger based on the result of an HTTP response. logger_name is a reference to a logger which must be declared in the root loggers section. It is structured in the same way as the provides_subsection except there is no explicit send parameter. When data is sent to a logger it has the same behavior as send: block, which means logging data can potentially block further requests from happening if a logger were to get "backed up". This is unlikely to be a problem unless a large amount of data was consistently logged. It is also possible to log to the same logger multiple times in a single endpoint by repeating the logger_name with a new select.

  • select - Determines the shape of the data sent into the logger.
  • for_each Optional - Evaluates select for each element in an array or arrays.
  • where Optional - Allows conditionally sending data into a logger based on a predicate.

Common types

Duration

A duration is an integer followed by an optional space and a string value indicating the time unit. Days can be specified with "d", "day" or "days", hours with "h", "hr", "hrs", "hour" or "hours", minutes with "m", "min", "mins", "minute" or "minutes", and seconds with "s", "sec", "secs", "second" or "seconds". Durations are templates, but can only be interpolated with variables defined in the vars section.

Examples:

1h = 1 hour

30 minutes = 30 minutes

Multiple duration pieces can be chained together to form more complex durations.

Examples:

1h45m30s = 1 hour, 45 minutes and 30 seconds

4 hrs 15 mins = 4 hours and 15 minutes

As seen above an optional space can be used to delimit the individual duration pieces.

Headers

Key/value pairs where the key is a string and the value is a template which specify the headers which will be sent with a request. Note that the host and content-length headers are added automatically to requests and any headers with the same name will be overwritten.

In an endpoints headers sub-section, a YAML null can be specified as the value which will unset any global header with that name. Because HTTP specs allow a header to be specified multiple times in a request, to override a global header it is necessary to specify the header twice in the endpoints headers sub-section, once with a null value and once with the new value. Not including the null value will mean the request will have the header specified twice.

For example:

endpoints:
  url: https://localhost/foo/bar
  headers:
    Authorization: Bearer ${sessionId}

specifies that an "Authorization" header will be sent with the request with a value of "Bearer " followed by a value coming from a provider named "sessionId".

Templates

Templates are special string values which can be interpolated with expressions. Interpolation is done by enclosing the expression in ${ }. For example: ${foo}-bar creates a string where a value from a provider named "foo" is interpolated before the string value -bar. ${join(baz, ".")} uses the join helper to create a string value derived from a value coming from the provider "baz".

Expressions

Expressions are like a mini-scripting language embedded within Pewpew. Expressions only deal with very limited data types--the JSON types--strings, numbers, booleans, null values, arrays and objects.

Expressions are most commonly used to access data from a provider (via templates) or to transform data from an HTTP response to be sent into a provider. Expressions allow the traversal of object and array structures, evaluating boolean logic and basic mathematic operators. Helper functions extend the functionality of expressions further.

Operators

OperatorDescription
==Equal. Check that two values are equal to each other and produces a boolean.
!=Not equal. Check that two values are not equal to each other and produces a boolean.
>Greater than. Check that the left value is greater than the right and produces a boolean.
<Less than. Check that the left value is less than the right and produces a boolean.
>=Greater than or equal to. Check that the left value is greater than or equal to the right and produces a boolean.
<=Less than or equal to. Check that the left value is less than or equal to the right and produces a boolean.
&&And. Checks that two values are true and produces a boolean.
||Or. Checks that one of two values is true and produces a boolean.
+Add. Adds two numbers together producing a number.
-Subtract. Subtracts two numbers producing a number.
*Multiply. Multiplies two numbers producing a number.
/Divide. Divides two numbers producing a number.
%Remainder. Provides the remainder after dividing two numbers.

Helper functions

Function Description

collect(item, n)

or

collect(item, min, max)

When used in a endpoints.declare subsection collect provides the special ability to "collect" multiple values from a provider into an array. collect can be called with two or three arguments. The two argument form creates an array of size n. The three argument form creates an array with a randomly selected size between min (inclusive) and max (exclusive).

When used outside a declare subsection, collect will simply return the item.

See the endpoints.declare subsection for an example.

encode(value, encoding)

Encode a string with the given encoding.

value - any expression. The result of the expression will be coerced to a string if needed and then encoded with the specified encoding.
encoding - The encoding to be used. Encoding must be one of the following string literals:

  • "base64" - Base64 encodes the value.
  • "percent-simple" - Percent encodes every ASCII character less than hexidecimal 20 and greater than 7E.
  • "percent-query" - Percent encodes every ASCII character less than hexidecimal 20 and greater than 7E in addition to , ", #, > and < (space, doublequote, hash, greater than, and less than).
  • "percent" - Percent encodes every ASCII character less than hexidecimal 20 and greater than 7E in addition to , ", #, >, <, `, ?, { and } (space, doublequote, hash, greater than, less than, backtick, question mark, open curly brace and close curly brace).
  • "percent-path" - Percent encodes every ASCII character less than hexidecimal 20 and greater than 7E in addition to , ", #, >, <, `, ?, {, }, % and / (space, doublequote, hash, greater than, less than, backtick, question mark, open curly brace, close curly brace, percent and forward slash).
  • "percent-userinfo" - Percent encodes every ASCII character less than hexidecimal 20 and greater than 7E in addition to , ", #, >, <, `, ?, {, }, /, :, ;, =, @, \, [, ], ^, and | (space, doublequote, hash, greater than, less than, backtick, question mark, open curly brace, close curly brace, forward slash, colon, semi-colon, equal sign, at sign, backslash, open square bracket, close square bracket, caret and pipe).

  • "non-alphanumeric" - Non-Alphanumeric encodes every ASCII character that is not an ASCII letter or digit.

Example: with the value foo=bar from a provider named baz, then the template https://localhost/abc?${encode(baz, "percent-userinfo"} would resolve to https://localhost/abc?foo%3Dbar.

end_pad(value, min_length, pad_string)

Pads a string or number to be minimum length. Any added padding will be added to the end of the string.

value - An expression whose value will be coerced to a string if needed.
min_length - the minimum length, as a positive integer, that the returned string should be. If the first parameter in string format is less than this amount then padding will be added to it.
pad_string - The padding string to use. If the amount of padding needed is less than the length of this string then it will be truncated from the right. If the needed padding is more than the length of this string, then this string is repeated until it is long enough.

Example: with the value "Jones" from a provider named lastName, then the string ${end_pad(lastName, 8, "-")} would resolve to Jones---.

entries(value)

Returns the "entries" which make up value. For an object this will yield the object's key/value pairs. For an array it yields the array's indices and elements. For a string it yields the indices and the characters. For boolean and null types it yields back those same values.

Examples

With the value {"a": {"foo": "bar", "baz": 123}, "b": ["abc", "def"], "c": "xyz", "d": null } coming from a provider named test:

entries(test.a)

would return [["foo", "bar"], ["baz", 123]].

entries(test.b)

would return [[0, "abc"], [1, "def"]].

entries(test.c)

would return [[0, "x"], [1, "y"], [2, "z"]].

entries(test.d)

would return null.

epoch(unit)

Returns time since the unix epoch.

unit - A string literal of "s" (seconds), "ms" (milliseconds), "mu" (microseconds), or "ns" (nanoseconds).

if(check, true_value, false_value)

Does a boolean check against the first argument, if true the second argument is returned otherwise the third argument is returned.

check - An expression which will be coerced to a boolean if needed.
true_value - The value that is returned if check evaluates to true.
false_value - The value that is returned if check evaluates to false.

Example: if(true, 1, 2) would always resolve to 1.

join(value, separator)

or

join(value, separator, separator2)

Turns an array of values into a string or turns an object into a string.

value - any expression. When the expression resolves to an array, the elements of the array are coerced to a string if needed and are then joined together to a single string using the specified separator. When the value resolves to an object and the three argument variant is used then the object will be turned into a string with the specified separators. In any other case value is coerced to a string and returned.
separator - a string literal which will be used between each element in the array. In the case of the three argument variant when the first argument is an object separator is used to separate key/value pairs.
separator2 - a string literal which is used to separate keys and values in an object.

Examples

With the value ["foo", "bar", "baz"] from a provider named qux, then the template https://localhost/some/thing?a=${join(qux, "-")} would resolve to https://localhost/some/thing?a=foo-bar-baz.

With the value {"a": 1, "b": 2} from a provider named foo, then the expression join(foo, "\n", ": ") would resolve to the following string:

a: 1
b: 2

or for an alternative, json-ified view: "a: 1\nb: 2"

json_path(query)

Provides the ability to execute a json path query against an object and returns an array of values. The query must be a string literal.

Example: json_path("response.body.ships.*.ids")

match(string, regex)

Allows matching a string against a regex. Returns an object with the matches from the regex. Named matches are supported though any unnamed matches will be a number based on their position. Match 0 is always the portion of the string which the regex matched against. If the regex does not match null is returned.

If the first parameter is not a string it will be coerced into a string.

Regex look arounds are not supported.

Example:

If a response body were the following:

<html>
<body>
Hello, Jean! Today's date is 2038-01-19. So glad you made it!
</body>
</html>

Then the following expression:

match(response.body, "Hello, (?P<name>\w+).*(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})")

Would return:

{
  "0": "Jean! Today's date is 2038-01-19",
  "name": Jean",
  "y": "2038",
  "m": "01",
  "d": "19"
}

max(...number)

Selects the largest number out of a sequence of numbers. Each argument should be an expression which resolves to a number otherwise it will not be considered in determining the min. If no arguments are provided, or if none of the arguments resolve to a number, then null will be returned.

min(...number)

Selects the smallest number out of a sequence of numbers. Each argument should be an expression which resolves to a number otherwise it will not be considered in determining the max. If no arguments are provided, or if none of the arguments resolve to a number, then null will be returned.

random(start, end)

Generates a random number between start (inclusive) and end (exclusive). Both start and end must be number literals. If both numbers are integers only integers will be generated within the specified range. If either number is a floating point number then a floating point number will be generated within the specified range.

range(start, end)

Creates an array of numeric values in the specified range.

start - any expression resolving to a whole number. Represents the starting number for the range (inclusive).

end - any expression resolving to a whole number. Represents the end number for the range (exclusive).

Examples:

range(1, 10)

range(50, 1)

repeat(n)

or

repeat(min, max)

Creates an array of null values. The single argument version creates an array with a length of n. The three argument form creates an array with a randomly selected size between min (inclusive) and max (exclusive). This is mainly useful when used within a for_each to have the select expression evaluated multiple times.

Example: repeat(10)

start_pad(value, min_length, pad_string)

Pads a string or number to be minimum length. Any added padding will be added to the start of the string.

value - an expression whose value will be coerced to a string if needed.
min_length - the minimum length, as a positive integer, that the returned string should be. If the first parameter in string format is less than this amount then padding will be added to it.
pad_string - The padding string to use. If the amount of padding needed is less than the length of this string then it will be truncated from the right. If the needed padding is more than the length of this string, then this string is repeated until it is long enough.

Example: with the value 83 from a provider named foo, then the string id=${start_pad(foo, 6, "0")} would resolve to id=000083.

replace(needle, haystack, replacer)

Replaces any instance of a string (needle) within a JSON value (haystack) with another string (replacer). This function will recursively check the JSON for any string value of needle and replace it with replacer. This includes checking within a nested object's key and value pairs, within arrays and within strings.

needle - an expression whose value will be coerced to a string if needed.
haystack - the JSON value to search
replacer - an expression whose value will be coerced to a string if needed.

Example: with the value {"foo": "baz", "zed": ["abc", 123, "fooo"]} from a provider named a, then the expression replace("foo", a, "bar") would resolve to {"bar": "baz", "zed": ["abc", 123, "baro"]}.

parseInt(value)

Converts a string or other value into an integer (i64). If the value cannot be converted to a number, then null will be returned.

value - any expression. The result of the expression will be coerced to a string if needed and then converted.

parseFloat(value)

Converts a string or other value into an floating point number (f64). If the value cannot be converted to a number, then null will be returned.

value - any expression. The result of the expression will be coerced to a string if needed and then converted.

Command-line options

There are two ways that Pewpew can execute: either a full load test or a try run. For reference here's the output of pewpew --help:

The HTTP load test tool https://familysearch.github.io/pewpew

Usage: pewpew <COMMANND>

Commands:
  run    Runs a full load test
  try    Runs the specified endpoint(s) a single time for testing purposes
  help   Print this message or the help of the given subcommand(s)

Options:
  -h, --help       Prints help information
  -V, --version    Prints version information

As signified in the above help output, there are two subcommands run and try.

Here's the output of pewpew run --help:

Usage: pewpew run [OPTIONS] <CONFIG>

Arguments:
  <CONFIG>  Load test config file to use

Options:
  -f, --output-format <FORMAT>         Formatting for stats printed to stdout [default: human]
                                       [possible values: human, json]
  -d, --results-directory <DIRECTORY>  Directory to store results and logs
  -t, --start-at <START_AT>            Specify the time the test should start at
  -o, --stats-file <STATS_FILE>        Specify the filename for the stats file
  -s, --stats-file-format <FORMAT>     Format for the stats file [default: json]  [possible values:
                                       json]
  -w, --watch                          Watch the config file for changes and update the test
                                       accordingly
  -h, --help                           Prints help information

The -f, --output-format parameter allows changing the formatting of the stats which are printed to stdout.

The -d, --results-directory parameter will store the results file and any output logs in the specified directory. If the directory does not exist it is created.

The -w, --watch parameter makes pewpew watch the config file for changes. The watch_transition_time general config option allows specifying a transition time for switching to the new load_patterns and peak_loads.

While any part of a test can be updated, special care should be made when modifying or removing endpoints. This is because the aggregation of statistics happens based upon the numerical index of where it appears in the config file. If, for example, the first endpoint is no longer needed and it is simply removed from the test, that means what was the second endpoint is now the first and all of the statistics for that endpoint will begin aggregating in with the first endpoint's statistics. An alternative approach to removing the endpoint would be to set the peak_load on the first endpoint to 0hpm.

Here's the output of pewpew try --help:

Usage: pewpew try [OPTIONS] <CONFIG>

Arguments:
  <CONFIG>  Load test config file to use

Options:
  -o, --file <FILE>                    Send results to the specified file instead of stdout
  -f, --format <FORMAT>                Specify the format for the try run output [default: human]
                                       [possible values: human, json]
  -i, --include <INCLUDE>              Filter which endpoints are included in the try run. Filters
                                       work based on an endpoint's tags. Filters are specified in
                                       the format "key=value" where "*" is a wildcard. Any
                                       endpoint matching the filter is included in the test
  -l, --loggers                        Enable loggers defined in the config file
  -d, --results-directory <DIRECTORY>  Directory to store logs (if enabled with --loggers)
  -k, --skip-response-body             Skips reponse body from output (try command)
  -K, --skip-request-body              Skips request body from output (try command)
  -h, --help                           Prints help information

A try run will run one or more endpoints a single time and print out the raw HTTP requests and responses to stdout. By default all endpoints are included in the try run. This is useful for testing out a config file before running a full load test. When the --include parameter is used, pewpew will automatically include any other endpoints needed to provide data for the explicitly included endpoints.

The -i, --include parameter allows the filtering of which endpoints are included in the try run. Filtering works based on an endpoint's tags (see the tags parameter in the endpoints section). The INCLUDE pattern is specified in the format key=value or key!=value and an asterisk * can be used as a wildcard. This parameter can be used multiple times to specify multiple patterns. An endpoint which matches any of the patterns is included in the try run.

The -l, --loggers flag specifies that any loggers defined in the config file should be enabled. By default, during a try run, loggers are disabled.

The -d, --results-directory parameter will store any log files (if the --loggers flag is used) in the specified directory. If the directory does not exist it is created.

The -k, --skip-response-body parameter ensures that during a Try run, the response bodies aren't displayed. This can be particularly useful for debugging responses when the body is very long and not crucial for the debugging process.

The -K, --skip-request-body parameter ensures that during a Try run, the request bodies aren't displayed. This can be particularly useful for debugging requests when the body is very long and not crucial for the debugging process.

In both the run and try subcommands a config file is required.

environment variables

While most environment variables are passed on to the vars section of the config file, there are a few that affect the pewpew executable.

  • RUST_BACKTRACE Optional - Enable display of the stack backtrace on errors. Providing any parameter (other than falsey/0) will enable this. Examples. RUST_BACKTRACE=1 or RUST_BACKTRACE=full.
  • RUST_LOG Optional - A LevelFilter specifying what level for pewpew to log at. Allowed values are Off, Error, Warn, Info, Debug, and Trace. Default is Error. See Enable Logging for more complex options for RUST_LOG.

Viewing Results

At the end of every test, Pewpew creates a stats-*.json file with aggregated statistics from the test. To view the stats:

  1. Go here to open the results viewer.
  2. Drag the stats-*.json file onto the page.

Tuning a machine for a load test

Tuning a Linux machine

To get maximum throughput on Linux consider the following tweaks. NOTE: These tweaks have been tested in Ubuntu 18.04 and may be different in other distributions.

Append the following to /etc/sysctl.conf:

fs.file-max = 999999
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.ip_local_port_range = 1024 65535

Append the following to /etc/security/limits.conf:

*               -       nofile         999999

Tuning a Windows machine

Using the registry editor, navigate to the following path:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\

Add (or edit if it exists) the entry MaxUserPort as a DWORD type and set the value as 65534 (decimal).

Alternatively, save the following as port.reg and run the file:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"MaxUserPort"=dword:0000fffe

Found a bug?

Before filing a new issue on GitHub, it is helpful to create a reproducible test case. To do that, please do the following:

  1. Remove all endpoints, providers, loggers, vars, load_patterns and config options not needed to reproduce the issue.
  2. When possible, replace file providers with a variable (vars) or a concise list provider. If the file provider is required to replicate the issue, make the file as small as possible.
  3. Replace all references to environment variables with actual values in vars.
  4. Change the remaining endpoints to run against the Pewpew test server (see below).
  5. Reduce peak_loads and load_patterns as much as possible while still reproducing the issue.

Using the Pewpew test server

The Pewpew test server provides a way to reproduce problems without generating load against others' servers. The test server is a simple, locally run HTTP server which is usually run from the same machine that Pewpew runs from.

To run the test server first download the latest test server binaries here, extract the archive and run the executable from the command-line. You should then see a message like:

Listening on port 2073

The port the test server uses can be configured by setting the PORT environment variable. Here's an example run in bash:

$ PORT=8080 ./test-server
Listening on port 8080

The test server provides a single HTTP endpoint:

  • / - this endpoint acts as an "echo server" and will return within the response body any data that was sent to it. This endpoint should only ever return a 200 or 204 status code. It accepts all HTTP methods though only GET, POST and PUT can echo data back in the response. For the GET method to echo data back, specify the echo data in the echo query parameter. For POST and PUT simply put the data to be echoed back in the request body. The response will use the same Content-Type header from the response when specified, otherwise it will use text/plain.

    There is also an optional wait query parameter which defines a delay (specified in milliseconds) for how long the server should wait before responding.