In this article, we talk about process variables: what they’re for, how they differ from programming variables, and how scope works.This article continues the BPMN: Beyond the Basics series – a look at the practical, less-discussed aspects of working with BPMN for developers. Today, we’ll talk about process variables: what they’re for, how they differ from programming variables, and how scope works. At first glance, it might seem like there’s nothing special about them, but if you dig below the surface, there’s more nuance than expected. In fact, we couldn’t fit it all into one piece article, so we’re splitting this topic into two parts.
## Data in a Process
Process modeling traditionally begins with actions: approve a document, submit a request, sign a contract, plant a tree, deliver a pizza. Data is often left in the background, as if the performer intuitively knows where to get the right document or which pizza to deliver.
That works in human-centered, descriptive processes. People are expected to understand context, navigate loosely defined inputs, and follow general instructions. That’s why process models often resemble enhanced job descriptions more than software logic.
But when we move into automation, especially full automation, the game changes. A process without people must be **explicit about how it handles data**. It becomes not just a chain of steps, but a **data processing mechanism**. And if you want that mechanism to run fast and reliably, you need to understand how data is passed in, stored, transformed, and retrieved.
In short, a business process without data is like a horse without a cart — it might go somewhere, but it’s not carrying any value.
## Data-Centric Processes and BPM Engines
Even though classic processes like document flow and multi-step approvals are still important for many companies, the shift toward full automation is well underway. Gartner predicted that by 2025, 70% of organizations will have adopted structured automation—up from just 20% in 2021. And the whole workflow automation market? According to Stratis Research report, the workflow automation market is expected to top $45 billion by 2032, which shows just how much everyone wants to automate everything from start to finish.
Why the rush? It’s mostly about cutting down on mistakes, saving money, and speeding things up. Some studies (Gitnux) say automation can reduce errors by as much as 70% and lets people focus on more interesting, higher-value work. So, fully automated processes—where data processing and orchestration are front and center—are quickly becoming the new normal in digital transformation, not just a nice-to-have.
Let’s see how ready our BPM engines are for this. Spoiler: not very.
“BPMN does not provide a comprehensive data model. Data Objects and Data Stores are used to show that data is required or produced, but their structure is not defined.”
— “BPMN 2.0 Handbook: Methods, Concepts, and Techniques for Business Process Modeling and Execution” (2011)
“BPMN is not intended to model data structures or detailed data flows.”
— Bruce Silver, “BPMN Method and Style” (3rd Edition, 2016)
BPMN notation was originally created as a language for describing processes and does not include data models. Everything related to data in it is limited to so-called “`Data Objects“`, which only hint that some kind of data or documents are used in the process (judging by their icon). There is also the “`Data Store“`, which pretends to be a database or some information system, again based on its appearance.

In essence, these are just graphical symbols. Their role is limited to informing the diagram reader that some data exists in the system and there is interaction with storage. There is no engine-level implementation behind them.
As a result, we have a situation where there are clear rules for modeling the process itself according to the BPMN 2.0 standard, which are implemented in a very similar way (if not identically) across engines. But there is no unified mechanism for working with data — each developer decides how to handle it on their own.
On the one hand, this is good — freedom! You can choose the optimal solution for your tasks. On the other hand, the lack of clear rules often leads to fewer than ideal data handling solutions in projects.
## Why Do We Need Process Variables?
So, at the BPMN model level, we don’t have data as such — the diagram describes the structure and logic of the process but does not operate directly with variables. However, when the process is executed, data becomes necessary: somewhere you need to save user input, somewhere to make a decision, and somewhere to send a request to an external system. All this data exists in the form of process variables, which are “attached” to each process instance and accompany it throughout its execution.
Broadly speaking, process variables fulfill four roles:
– Data transfer between steps
– Process flow control
– Interaction with the user
– Integration with external systems

Process variables help carry information through the entire process, from start to finish. For example, a user fills out a loan application form, and the data they enter (amount, term, loan purpose) is saved into variables. Then these values are used in service tasks, passed to gateways, and determine which branches should be activated. In one branch, a variable may be updated or supplemented and then passed further, for example, to a verification or notification task.
They also allow us to control the behavior of the process during execution. Suppose we have a variable called “`priority“`, calculated based on the application parameters. If, at the process start, it equals “high,” the task is automatically routed to a specific specialist; otherwise, it goes into a general queue.
When the process interacts with a user, variables become the link between the form and the process logic. They are used both to display data and to receive user input. If a variable was already set on a previous step — for example, “`userEmail“` — its value can be shown in the form. The user, in turn, fills in other fields, and their values are saved back into variables to be used later in the process. Thus, the form works as a “window” into the current execution context: everything the process knows at this point is available to the user, and everything the user enters remains in the process.
Finally, process variables are a communication channel with the outside world. When a service task calls a REST API, it can use variables as input parameters and save the response to a variable. This response can then be analyzed, logged, passed to another service, or displayed to the user.
## The Lifecycle of a Variable
Now let’s talk about how process variables are born, live, and end their lifecycle. To do that, we first need to look at how they are structured.
In BPM systems such as Camunda, Flowable, or Jmix BPM, process variables are objects that store not only their value but also metadata about themselves. In other words, they’re not simple variables like in Java or another programming language — they’re containers that hold data.
Why make it so complicated? Because a process can run for a long time — hours, days, or even months. That’s why the values of variables need to be stored in a database, so they can be retrieved later when needed. And if we write something to the database, metadata appears as well—it’s only logical.
### Creation and Persistence of Variables
Note: The examples in this section are based on the Flowable engine.
So, data has entered the process — for example, as a payload in the start message, in the form of a “`Map
Additionally, a developer can create a variable in code whenever needed, also using the “`setVariable“` method — including from Groovy scripts:
“`runtimeService.setVariable(executionId, “docStatus”, “APPROVED”); “`
As long as the process hasn’t reached a transaction boundary, no database writing occurs. The variable remains in an in-memory structure — a “`List
However, once the process hits a wait state — such as a message or signal event, timer, user task, event-based gateway, or asynchronous task — the transaction is committed, and all entities created or modified during the transaction are flushed to the database. This includes process variables, which are written to the “`ACT_RU_VARIABLE“` table.

If the variable type is a primitive, it is stored as-is in the corresponding field of the table. Non-primitive variables, on the other hand, are serialized before being saved — and stored either as strings or byte arrays.
### Table Fields (Translated from Russian)
| Field | Type | Purpose |
|—————|——————|———————————————————————————————————————————————-|
| `id_` | `varchar(64)` | Unique variable ID. Primary key of the table. |
| `rev_` | `integer` | Record version (used for optimistic locking when updating the variable). |
| `type_` | `varchar(255)` | Variable type (e.g., string, long, boolean, serializable, json, object, bytearray, etc.). Determines which value fields are used. |
| `name_` | `varchar(255)` | Variable name. |
| `execution_id_`| `varchar(64)` | ID of the specific execution the variable is linked to. |
| `proc_inst_id_`| `varchar(64)` | ID of the process the variable belongs to. Used for retrieving variables by process. |
| `task_id_` | `varchar(64)` | Task ID if the variable is scoped at the task level. |
| `scope_id_` | `varchar(255)` | Used only in CMMN. |
| `sub_scope_id_`| `varchar(255)` | Used only in CMMN. |
| `scope_type_` | `varchar(255)` | Scope type (bpmn, cmmn, dmn). Not actually used in practice. |
| `bytearray_id_`| `varchar(64)` | ID of the entry in the `act_ge_bytearray` table where the variable value is stored (if it’s a byte array or a serialized object). |
| `double_` | `double precision`| Variable value if the type is double. |
| `long_` | `bigint` | Variable value if the type is long, integer, short, or boolean (as 0/1). |
| `text_` | `varchar(4000)` | Variable value if the type is string, json, date, or uuid. May also hold serialized values as text. |
| `text2_` | `varchar(4000)` | Additional text field, e.g., for serialization format or extra parameters. May be used in JSON/XML serialization. |
| `meta_info_` | `varchar(4000)` | Metadata about the variable, such as object class (if it’s a serialized object), or other engine-relevant info. Not used in Jmix BPM. |
Once a variable is written to the “`ACT_RU_VARIABLE“` table, it becomes part of the process’s persistent state. This means that even if the server is restarted, the process can be restored and resumed — along with all its variables. At this point, the cache is also cleared, and the “`VariableInstanceEntity“` objects are removed from memory.
In some cases, however, you may want a variable **not** to be stored in the database, for example, intermediate calculation results that aren’t needed in later steps of the process, or sensitive data like passwords, authorization tokens, and similar. In such cases, variables can be declared **transient** and will be kept in memory only. But this is up to the developer — by default, all variables are persistent.
### Reading and Updating Variables
Now let’s take a look at how variable reading works.

After one transaction is successfully completed, the next one begins, and a new variable cache is created. Initially, this cache is empty — variables are not loaded by default. Only when a variable is actually needed does the engine execute a command like:
“`String status = (String) runtimeService.getVariable(executionId, “docStatus”); “`
First, the engine locates the corresponding “`ExecutionEntity“` using the given “`executionId“`. This is the **execution context** that holds variables associated with a specific active step of the process. If the variable is not yet in memory, the engine issues an SQL query to the “`ACT_RU_VARIABLE“` table. The retrieved object is then deserialized (if necessary), added to the “`ExecutionEntity“` cache, and returned to the calling code.
If you need not just the value but the full variable information including metadata, you can request a “`VariableInstance“` object:
“`VariableInstance statusVar = runtimeService.getVariableInstance(executionId, “docStatus”); “`
Keep in mind, though, that this is a **read-only** object. If you want to update the variable, you must call “`setVariable“` again. The new value will be written to the database during the next commit. Technically speaking, this is not an update, but rather the creation of a new variable with the same name.
And here’s a subtle point: the engine does not enforce type consistency. So, if the variable originally held a string, and you later assign a number to it, the engine will accept it without complaint. However, this may lead to issues later — for example, when accessing historical data or using the variable in other steps of the process.
### Deleting Variables
When a process (or execution) ends, its variables are deleted. That is, all entries in the “`ACT_RU_VARIABLE“` table associated with the specific “`executionId“` are removed.
A developer can also proactively delete a variable before the process finishes:
“`runtimeService.removeVariable(executionId, “largePayload”); “`
Normally, this isn’t necessary just for “clean-up” purposes — the engine handles that on its own.
However, there are situations where proactively removing variables can be useful. For example, when a variable contains a large amount of data — say, a JSON object several megabytes in size or a high-resolution image. Keep in mind, this data is stored in the database, not in the Java process memory — so we’re not talking about garbage collection here, but about reducing database load.
If a variable contains personal or sensitive data (like temporary passwords or one-time codes) and is no longer needed after use, it should be deleted.
Some variables are used only within a single step (for example, intermediate results). These can be removed after that step finishes to avoid confusion or accidental reuse.
To avoid dealing with deletion altogether, it’s often better to declare such variables as **transient** right from the start.
Transient means that the variable is temporary and not saved permanently (for example, not stored in a database or persisted between sessions). It exists only during the runtime or the current process and disappears afterward.
### Writing to History
A key principle is that historical records for variables are **not** created when the process finishes, but rather **at the moment the variable is set** (“`setVariable“`), if history tracking is enabled. This is controlled by the “`historyLevel“` parameter.
| historyLevel | Description | What Is Stored |
|————–|——————————|——————————————————————————–|
| `none` | History is completely disabled | Nothing |
| `activity` | Minimal history: tracks process activity | Process start/end, tasks, events |
| `audit` | More detailed history | Everything from `activity` + variables (latest value only) |
| `full` | Complete history, including changes | Everything from `audit` + all variable changes (`ACT_HI_DETAIL`) |
By default, the engine uses the “`audit“` level — a compromise between useful historical data and performance.
So, in most cases, history is enabled. When a variable is created, its value is also written to the ACT_HI_VARINST table. More precisely, a “`HistoricVariableInstanceEntity“` is created and inserted into that table when the transaction is committed.
If “`historyLevel = full“`, **every** change to the variable is also recorded in the “`ACT_HI_DETAIL“` table.
**Important Note**
BPM engines do not provide a unified mechanism for working with variables. Variables can appear anywhere — as a field in user form, as a payload in a message or signal, or declared in script code, Java delegates, or Spring beans.
All of this is entirely the developer’s responsibility. Your IDE won’t be able to help. That’s why extreme attentiveness is required. One day, you may want to fix a typo in a variable name — but there will be no way to automatically track down all the places where it’s used.
### Scope
Like regular variables, process variables have scope. But this concept works quite differently in BPM than in programming languages. Just accept it as a fact and don’t rely on intuition — it can easily mislead you.
In programming, a variable scope is defined lexically, — depending on the class, method, or block where it is declared. It doesn’t matter whether the language uses static typing, like Java, or dynamic typing, like Python.
Process variables are something entirely different, as Monty Python might say. Essentially, they are not just variables but runtime objects. Therefore, the lexical approach doesn’t apply here. And although you can declare variables in a BPMN model, it’s not a true declaration like in programming. It’s more like a description of intent — the engine doesn’t require these variables to exist until they are actually set.
For example, in Jmix BPM, you can define process variables in the start event. Such declarations are useful for documentation purposes, so anyone reading the model understands which variables are used. And if the process is started programmatically, explicitly listing the variables helps the developer know what parameters are needed to start it.

But they will not appear in the process by themselves. They must either be passed as parameters or created at some subsequent step using the setVariable method. Otherwise, if you try to access them, the system will throw an error stating that such an object does not exist.
As we discussed in the first part of this article, process variables are created as a result of calling the “`setVariable“` method. Their scope is determined by their “birthplace,” almost like citizenship — that is, the execution context in which they were created.
When a process instance starts, the engine creates a root execution context (the process instance). Then, as the process progresses, these contexts form a hierarchical structure. For example, when the flow encounters an embedded subprocess, a parallel gateway, an asynchronous task, and so on, a new execution context is created. Subsequently, child contexts arise relative to previous ones. Thus, execution contexts form a tree.
Accordingly, the scope of variables is defined by the position of their execution context within this tree. Variables created in a higher-level context are visible in all the lower levels beneath them.
Let’s take a process as an example and mark all its execution contexts:

Then, represented as a tree, it will look like this:

If we define the variable “`orderId“` at the top level of the process, it will be accessible everywhere. But a variable like “`discount“`, if it is set in the first parallel branch, will only be visible within its own execution context and cannot be accessed later outside of it. So, it’s important to plan variable declarations with their scope in mind.
A nested subprocess not only helps to structure execution logic but also creates a new scope — and this can actually be its more important feature.
A separate story applies to external subprocesses (call activities). Each such call is wrapped in its own execution context. That’s why in the second parallel branch we see another nested execution. But the external subprocess itself runs as a completely separate instance, and by default does not see variables of the parent process. You must explicitly map variables to pass them into the child process — and similarly map them back if needed.
If you have event subprocesses, each one lives in its own execution and waits to be activated. There are no special tricks here — it sees all process-level variables plus its own.
When a multi-instance task occurs on the path, first a common execution context is created (in our example — execution 2) which tracks all instances. Then each instance gets its own separate execution context. A common mistake here is when someone tries to write a variable at the top level from a specific instance — for example, in parallel document approvals. As a result, all approvers overwrite the same variable, and you only see the decision of the last approver. The key here is not to get confused by all these variables, which are often named the same.
This situation is resolved by local variables, which we will discuss below.
## Local Variables
If you set a variable in the usual way, it will be visible to all child executions. But if you use the method “`setVariableLocal“`, it will be “locked” within the current execution and won’t be visible outside of it, including in lower-level contexts.
Okay. But why would you need to guarantee that a variable is not passed down the hierarchy?
Actually, this isn’t the main purpose. Local variables help keep things organized in your process: when you declare a variable as local, you make sure it won’t accidentally overwrite a variable with the same name in a broader (global or parent) scope.
Returning to our approval example: when in each instance of the multi-instance subprocess we explicitly specify that the comment field and the document decision are local variables, the chance of confusion is reduced.
In general, local variables are a mechanism for isolation and error prevention rather than a functional necessity. They do not solve problems that couldn’t be solved otherwise, but they do it more safely and cleanly.
## Variable Shadowing
What if variables have the same names?
This can happen — you create a variable in a child execution context with the same name as one already presented in the parent. In this case, variable shadowing occurs — the parent variable becomes inaccessible in the current context, even though it still exists in the execution tree.
**How it works**
Each execution context contains its own set of variables. When accessing a variable via the method “`getVariable(String name)“`, the engine first looks for it in the current execution. If the variable is found — it is used, even if a variable with the same name exists at a higher level. Thus, the higher-level variable is “shadowed.”
“`// Somewhere at the start of the process
execution.setVariable(“status”, “CREATED”);
// Inside a task or subprocess:
execution.setVariableLocal(“status”, “PROCESSING”);
// What will a script or service see in this execution?
String currentStatus = (String) execution.getVariable(“status”); // “PROCESSING”
“`
Although the parent variable still exists, the child variable overrides it within the current execution. Once you exit the scope of the child context (for example, when the subprocess ends), the higher-level variable becomes visible again.
Variable shadowing can be useful when used correctly, but it also represents a potential source of errors. In some scenarios, it provides an advantage — for example, allowing you to temporarily override a variable without changing its original value. This is especially convenient in multi-instance constructs where each instance works with its own copy of data.
However, shadowing can lead to unexpected results if you are not sure which context, you are in. Debugging becomes more difficult: in the history, you may see a variable with the same name, but it’s not always clear at what level it was created or why its value differs.
To avoid such issues, it’s recommended to follow several guidelines. It’s better to use different variable names if they have different meanings or belong to different execution levels.
Also, consciously manage the context in which variables are created and avoid using “`setVariableLocal“` unless there is a clear need. When analyzing process state, it is helpful to check variables at the current and parent levels separately using “`getVariableLocal()“` and “`getParent().getVariable()“` to get the full picture.
## Types of Process Variables
As for variable types, their variety depends on the engine developer — as mentioned earlier, the specification does not define this, so each implementation does its own thing. Of course, there is a common set that includes primitive types — strings, numbers, boolean values, as well as dates. But even here there are differences — compare the two illustrations and see for yourself.
In Camunda, we have one set of data types, while in Jmix BPM (with the Flowable engine) it is somewhat different.

Regarding the basic types, this difference is insignificant and may only become apparent when migrating from one engine to another. But there are some interesting distinctions worth mentioning.
You’ve probably noticed that Camunda supports JSON and XML types. However, this is not a built-in feature of the engine itself — to work with them, you need a special library called **Camunda Spin**, designed for handling structured data within business processes. It provides a convenient API for parsing, navigating, and modifying data, as well as automatically serializing and deserializing data when passing it between tasks. This can be especially useful when processing responses from REST services, storing complex structures as variables, and generating XML/JSON documents.

In turn, Jmix BPM allows you to use elements of the Jmix platform’s data model as process variables — entities, lists of entities, and enumerated types (Enums). This is especially helpful when you need to manipulate complex business objects within a process that contains dozens of attributes. For example, applications, contracts, support tickets, and so on.
Entity Data Task — Accessing the Data Model from the Process
Jmix BPM includes a special type of task called the Entity Data task. With it, you can create new entity instances, modify them, and load individual entities or collections of entities obtained via JPQL queries directly into process variables right from the process. This is not an extension of the BPMN notation per se. Technically, these are just regular service tasks with a specific set of parameters.
Thus, you can model a process in a low-code style — using User tasks for user actions and Entity Data tasks for data manipulation. If no complex integrations or logic are required, this approach is often sufficient.
Let’s consider a hypothetical example. Suppose some data arrives. The system checks whether it relates to an existing customer order or not. Depending on the result, it executes one task or another; either creating a new Order entity or loading/finding an existing one. Then, an employee performs some work, and an Entity Data task updates the modified attributes of the entity.

Of course, a real process would be more complex — this diagram merely illustrates the concept of how you can use an Entity Data task to work with data.
## Limitations
This section outlines the limitations of process variables in different BPM products.
### Camunda
**String Length Limitation**
In Camunda, values of type String are stored in the database in a column of type (n)varchar with a length limit of 4000 characters (2000 for Oracle). Depending on the database and configured character encoding, this limit may correspond to a different number of actual characters. Camunda does not validate the string length — values are passed to the database “as is.” If the allowed limit is exceeded, a database-level error will occur. Therefore, it’s the developer’s responsibility to control the length of string variables.
**Context Size Limitation**
Although process variables are stored in a separate database table, in practice, there is a limit to the total amount of data associated with a process instance. This limit is not so much about the physical storage but rather the internal mechanisms of serialization, memory loading, transactional handling, and other engine internals. A typical safe threshold is around **3–4 MB** per process context. This includes serialized variables, internal references, events, and other metadata. The exact value depends on the DBMS, serialization format, and engine configuration.
Storing too many variables or large documents in the process context can lead to unexpected “`ProcessEngineExceptions“` due to exceeding the allowable size of a serialized object. Therefore, when working with large variables, it is recommended to stay well below this limit and conduct performance testing if needed.
In **Camunda 8**, there is a strict limit on the size of data (payload) that can be passed into a process instance. The total maximum size of all process variables is **4 MB**, including engine internals. However, considering overhead, the safe threshold is around **3 MB**.
### Flowable / Jmix BPM
**String Length Limitation**
Flowable handles strings a bit differently: if the length of a “`String“`-type variable exceeds 4000 characters, it is automatically assigned an internal type of “`longString“`. Its value is then stored as a byte array in the “`ACT_GE_BYTEARRAY“` table, while the “`ACT_RE_VARIABLE“` table contains a reference to it. As a result, Flowable does not impose any explicit limit on string length.
In theory, the length of a string variable is only limited by the Java maximum —
“`Integer.MAX_VALUE = 2,147,483,647“`
(roughly 2.1 billion characters).
However, in practice, the effective limit is determined by available heap memory.
**Entity List Size Limitation**
For variables of type “`Entity List“`, there is a constraint related to the way they are stored. When saved to the database, such variables are serialized into a string format like:
“`
“`
jal_User.”60885987-1b61-4247-94c7-dff348347f93″
“`
This string is saved in a text field with a maximum length of 4000 characters. As a result, the list can typically contain around **80 elements** before exceeding the limit. However, this restriction only applies at the point of database persistence — in memory, the list can be of any size.
**Context Size Limitation**
Flowable (and thus Jmix BPM) does not enforce a hard limit on the size of the process context. Still, it’s recommended to keep the context as small as possible, since a large number of process variables can negatively impact performance. Also, you might eventually run into limits imposed by the underlying database system.
## Process Variables in Groovy Scripts
Scripts in BPMN are often underrated, yet they are a powerful tool—especially lightweight, in-process logic. They’re commonly used to initialize variables, perform simple calculations, log messages, and so on. BPM engines typically support **Groovy** or **JavaScript** as scripting languages, with **Groovy** being more prevalent due to its concise syntax, native Java compatibility, and ease of use when working with process objects. The most important of these is the execution object, which represents the process context and allows you to work with process variables.
Your main workhorses are the familiar “`setVariable“` and “`getVariable“` methods, used to write and read process variables. However, there’s a feature that—while seemingly convenient—can lead to hard-to-diagnose bugs:
**process variables in Groovy scripts are accessible directly by name.**
That means you can reference them in expressions without explicitly reading them first:
“`amount = price * quantity “`
But here’s the catch:
**The assignment operator does not affect process variables.**
So, after that expression, the value of the amount process variable will **not** change. To actually update it, you must explicitly call “`setVariable“`:
“`execution.setVariable(“amount”, price * quantity) “`
This is because a process variable is part of the execution context, not just a regular Groovy variable—it must be handled explicitly.
To complicate things further, Groovy allows you to define **local script variables** on the fly. Since Groovy is a dynamic language, it will silently create a new script variable if one isn’t already defined. So, if you haven’t explicitly created the process variable amount, the following line will still work:
“`
amount = price * quantity
execution.setVariable(“amount”, amount)
“`
And there’s more to it! Even though Groovy doesn’t require it, **declaring variables explicitly** using def is considered best practice. It helps avoid subtle bugs:
“`
def amount = price * quantity
…
execution.setVariable(“amount”, amount)
“`
Now everything is clean and correct. Right? — Well, almost.
When your script task is marked as **asynchronous**, this convenient implicit access to process variables by name might break.
Consider the following line in an asynchronous script:
“`
execution.setVariable(“counter”, counter + 1L)
“`
You might get a confusing error like:
“`
groovy.lang.MissingPropertyException: No such property: counter for class: Script2
“`
This means the variable counter wasn’t available in the script context at execution time.
Why? Because engines like Flowable inject process variables into the script environment **automatically**, but for **asynchronous executions**, this may happen **too late**—after the script has already started running.
To avoid this issue, always **explicitly read** the variable at the start of the script:
“`
def counter = execution.getVariable(“counter”)
execution.setVariable(“counter”, counter + 1L)
“`
And just like that — no more exceptions!
## Best Practices
Working with process variables is a key part of building reliable business processes. Variables allow the process to “remember” data, pass it between tasks, and make decisions. However, without discipline, they can easily become a source of bugs—from subtle logic errors to critical failures. This section outlines proven practices to help you avoid common pitfalls and build a clean, predictable data model in your process.
1. **Use clear and unique variable names**
Choose meaningful and descriptive names. Avoid duplicates.
Good: orderApprovalStatus
Bad: status
2. **The fewer variables, the better**
Keep in mind that the process context isn’t unlimited. Avoid creating unnecessary variables. Even without hard size limits, bloated contexts hurt performance.
Avoid storing large documents or massive JSON structures directly—store them as files and keep only a reference in the process.
3. **Use transient variables for temporary data**
If data is only needed within a single action or expression, define it as “`transient“`. It won’t be saved to the database or show up in history.
4. **Be cautious with entity variables**
Many BPM platforms support storing complex objects (e.g. entities) in process variables. This is convenient for business logic involving entity attributes.
However, if you load an entity into a variable at the start of the process, and later display it in a form days later, can you be sure it hasn’t changed? — Definitely not.
Instead, store the entity’s **ID** as a variable and re-fetch it when needed.
5. **Serialization may surprise you**
When saving complex objects, the engine serializes them before writing to the DB. Not all Java objects can be stored—your object must either implement “`Serializable“`, or you must provide and register a custom serializer.
Also, serialization may behave differently in regular vs. asynchronous tasks, since async tasks run in the “`JobExecutor“` which has a different context.
6. **Link entities to processes**
Often, a process is started in relation to an entity—an order, document, customer request, etc. You may want to navigate from the entity to its process.
Here’s a simple trick: add a “`processInstanceId“` field to your entity. No more searching with complex API queries—just follow the ID.
7. **Don’t blindly pass all variables into a Call Activity**
It’s tempting to tick the box that passes all variables from the parent to the subprocess. But it’s better to explicitly map only the variables you need.
Otherwise, you risk data leaks or serialization issues.
8. **Configure history settings**
The more variables you have, the faster your history tables will grow. All BPM engines support history cleanup mechanisms—take time to configure them.
If you need to keep historical data, create a mechanism to export it from the process DB to your own storage.
Also, be mindful of sensitive data—like credentials or API keys. They might be safe during the process but later get dumped into history with all other variables. And history access might be less secure than the live process.
So, to be safe—**avoid storing sensitive variables in history**.
9. **Use local variables where appropriate**
Technically, you can manage without local variables. But using them helps keep things organized. A clear local variable reduces the chance of errors.
However, don’t overuse them. It’s not necessary to make every child context variable local—sometimes you need them to propagate downward.
10. **Avoid unnecessary variable shadowing**
Variable shadowing (redefining variables in nested contexts) is mostly useful in multi-instance activities.
Outside of that, it’s better to give variables unique names to prevent confusion.
11. **Document your variables**
BPM engines don’t manage variables, and IDEs typically don’t help with this either.
Maintain your own variable registry—describe what each variable is for and where it’s used.
This will make your processes easier to maintain in the long run.
## Conclusion
Managing process variables with care is essential for building robust, maintainable business processes. By following these best practices, you can avoid common pitfalls and ensure your processes remain reliable and easy to support.
Thoughtful variable management not only prevents bugs and performance issues but also makes your process models more transparent for everyone involved. In the end, a little discipline in how you handle variables goes a long way toward creating clean, predictable, and future-proof BPM solutions.
Source: Everything a BPM Developer Should Know About Process Variables