### The AQM algorithm

The default work of the AQM algorithm is represented in the RED algorithm. Congestion RED Inception depends on the average queue size (avg), two thresholds ((min_{th}) and (max_{th})) and the maximum fall probability ((max_p)). Each packet arrives in the router’s queue; the algorithm calculates the average using the exponentially weighted moving average (EWMA) as a low-pass filter, as shown in the equation. (1) as follows:

$$begin{aligned} avg = left( 1-w_qright) avg+w_qq end{aligned}$$

(1)

where (w_q) is the queue weight and q is the current queue size. If the average value is (min_{th}, the algorithm starts marking incoming packets in the router queue using the Explicit Congestion Notification (ECN) available in TCP/IP. Therefore, to reduce the sending rate, the drop probability P can be calculated based on the equation. (2) as follows:

$$begin{aligned} P= max_p(avg-min_{th})/(max_{th}+min_{th}) end{aligned}$$

(2)

RED algorithm starts dropping all incoming packets when avg exceeds (max_th) to manage queue congestion. The disadvantages of RED are slow response to congestion and difficulty in setting the parameter. Thus, these drawbacks cause the algorithm to not work well when different applications and services use different data rates. The GRED has three threshold values ((min_th),(max_th) and double (max_th)) to reduce the fall probability slope curve. The proposed algorithm used three thresholds as in GRED and the dynamic value of (w_q) selected using the Markov process.

### Markov process

The MDP depends on the combination of ((St), (at), (r_t)t)^{17} with transition probability to determine what action should be taken for a given state, which can be seen in Eq. (3) as follows:

$$begin{aligned} P(s_{t+1}vert s_t,a)=P(s_{t+1}=jvert s_t=i,a=k) end{aligned}$$

(3)

where (St) and (s_{t+1}) respectively indicate the current state and the next state, while a indicates that an action must be taken. *I* and *I* can be 1,2,3, … which represent states, where *I* is the next state and *I* is the current state; and k can be 1,2,3, … to indicate which action is taken.

(r_t) is the reward (return) from the environment to the agent after an action is applied to the current state, as shown in the equation. (4), and this reward can be the maximum or the minimum. In this work, the reward represents a minimum number of discarded packets as follows:

$$begin{aligned} R_t=sum _{i=t}^{infty } r_{i}=r_t+r_{t+1}+r_{t+2}+… end{aligned }$$

(4)

MDP includes discount factor (gamma), which has a value between 0 and 1 to give weight to future rewards. In this study, the value of (gamma)=0 includes only the immediate reward without implying future rewards, which is indicated in the equation. (5) as follows:

$$begin{aligned} R_t=r_t+sum _{i=t+1}^{infty } gamma ^i r_i end{aligned}$$

(5)

To map MDP to the RED algorithm, it is necessary to consider the average queue length, avg, as a state and the type of dropped packets as an action. We assume four sets of states S=(s_1, s_2, s_3, s_4). Then we have the (4times 4) transition probability matrix shown in equation. (6). (s_1) means the average TCP state for a slow start below a minimum threshold ((0, (s_2) indicates the average state between (min_{th}) thresholds and halfway between the two thresholds ((min_{th} indicates the state when the avg is between half of the two thresholds and below the (max_{th} ((min_{th} + max_{th})/2and (s_4) indicates the state when the average is greater than (max_{th}) and less than double (max_{th}) ((max_th. We assume that there are three sets of actions A=(a_1,a_2,a_3)where (a_1) represents the no-drop packet, while (a_2) and (a_3) indicate the non-forced fall and the forced fall, respectively, as follows:

$$begin{aligned} P_{ij} = begin{bmatrix} P_{00} &{} P_{01} &{} P_{02} &{} P_{03} P_{10} &{ } P_{11} &{} P_{12} &{} P_{13} P_{20} &{} P_{21} &{} P_{22} &{} P_{23} P_{ 30} &{} P_{31} &{} P_{32} &{} P_{33} end{bmatrix} end{aligned}$$

(6)

### Design and methodology

Point-To-Point network topology Dumbbell constructed by NS3 with ON-OFF application was used in this study as shown in Fig. 1. In the current study, the network has five nodes on each left and right side at the start then increasing by 5 nodes in each simulation round up to 200 nodes, the source and the destination, respectively, and the two nodes of the middle that reflect the router in the core network create a bottleneck. Figure 1 shows that the sending nodes did not send any packets, while in Figure 2 the bottleneck link was congested and the packet drop procedure started.

### TCP slow start

The TCP protocol has default congestion control with four phases: slow start, congestion avoidance, fast retransmit, and fast recovery^{18}. The slow start phase allows the TCP to inform the capacity of the links in the transition path; this happens by duplicating the window size per RTT of each acknowledgment received. If no acknowledgment is received, the TCP indicates that congestion is occurring, and the congestion avoidance phase begins; then the window size increases by one for all successful acknowledgments. Therefore, the slow TCP begins to increase exponentially in each RTT, which means that congestion can occur in a short time.

### Tuning Algorithm Parameters

The default values in the RED algorithm settings are (w_q)=0.002, (min_{th})=5, (max_{th})=15, (max_p)=1/50^{2}. In the present study, the probability transition matrix (P_{ij}) in eq. (6) is calculated based on the weight of the queue (w_q). Equation (7) shows the relationship between the loading rate and (w_q)as well as their effect on the mean value^{2} as following:

$$begin{aligned} avg=L+1+frac{(1-w_q)^{(L+1)}-1}{w_q} end{aligned}$$

(seven)

where L represents the load rate of the sending packets. (w_q) works as a time constant in a low-pass filter on the average queue value, which reflects the response time and the queue weight of incoming packets, as shown in Fig. 3. Therefore, if (w_q) is too large, then the algorithm does not filter the congestion that appeared in a short time, especially during slow start of TCP protocol, as mentioned in the previous subsection. Whether (w_q) is too small, then the algorithm’s response to congestion is too slow to reflect the change in queue size, and the small value of the queue weight is suitable for the slow start phase in the TCP. In this study, the initial value of (w_q) set to zero and incremented by 0.002 when the state of avg transitions to another state, these values selected to handle the exponential increase in TCP slow start flow. Table 1 shows the range of (w_q) used in the suggested algorithm.

Using the values from the (w_q), the MDP probability transition matrix is obtained, shown in Equation. (6) as follows:

$$begin{aligned} P_{ij} = begin{bmatrix} 0 &{} 0.002 &{} 0.002 &{} 0.002 0.004 &{} 0 &{} 0.004 &{} 0.004 0.006 &{ } 0.006 &{} 0 &{} 0.006 0.008 &{} 0.008 &{} 0.008 &{} 0 end{bmatrix} end{aligned}$$

(8)

The diagonal matrix assigns a null value, which means there is no change in the weight value of the queue if the state is the same. The network parameters implemented are shown in Table 2: