Style #3

This style has one process, and uses synthesiser-determined state encodings. The two outputs are generated directly from the clocked process, which guarantees that the module outputs are registered.

There are a number of interesting points to note about this style:

  • The specific Verilog parameter encodings are not important, since the design intent is that the synthesiser should replace the codings anyway. The codings are specified arbitrarily here as 0, 1, 2, and 3; this is sufficient to allow the RTL code to simulate, and to allow the synthesiser to analyse the code without errors. However, as it turns out, the codings are not quite as arbitrary as they may appear to be; see the synthesis results below.
  • The Verilog module declaration is now output reg, rather than simply output, since the outputs are driven by an always block. The reg keyword is optional in the module declaration in the DUT section of the testbench code (fsm.tv), so the same testbench may be used.
  • The SCK and BUSY outputs are driven from the clocked process. This is a little harder to code than Style #1 and Style #2, since we have to decode the conditions that will result in the required output after the next clock edge. For example, if the FSM is currently in st3, we set BUSY active because we know that the next state will be st1, and BUSY is required in st1. the next clock edge will change the state from st3 to st1, and will change BUSY from 0 to 1.
  • The Verilog code requires a default clause in the case statement. This is not, strictly speaking, essential, but XST issues a warning if it is not present, and the RTL code latches metavalues in the state register. The VHDL code does not require a default (others), since both the simulator and the synthesiser agree that STATEREG has only 4 possible values.
  • The major complication in Style #3 is that we have to be sure that all the outputs (in this case, there are only two) are always defined in every branch of the case statement. There are 3 ways to do this:
    1. Every branch can simply explicitly list every output that has to be assigned, setting it to 0 or 1. This is tedious and impractical if there are a large number of outputs.
    2. By keeping track of what the current value of an output is, you can reduce verbosity by only assigning to the output when it changes. This is what the two examples below do. While this is generally concise, it is possibly more error-prone.
    3. The outputs can all be set to a default value just before the case statement; the case statement branches can then just assign to the output when it need a non-default value. This works in both VHDL and Verilog (there is a common misconception that this doesn't work in Verilog). This option is used in Style #4.
Verilog

fsm3.v

module FSM (
  input  CLK, SRST, LOAD, TC,
  output reg SCK, BUSY);

  parameter [1:0] 
          st0 = 0,   // important: this shows an
          st1 = 1,   // initial coding of 0123
          st2 = 2, 
          st3 = 3;
  reg [1:0] STATEREG;

  always @(posedge CLK)
    if(SRST) begin
      STATEREG <= st3;
      BUSY     <= 0;
      SCK      <= 1;
    end else
      case(STATEREG)
      st3: begin
        STATEREG <= st1;
        BUSY     <= 1;
        SCK      <= 0;
      end
      st0:
        if(LOAD) begin
          STATEREG <= st1;
          BUSY     <= 1;
        end
      st1: begin
        STATEREG <= st2;
        SCK      <= 1;
      end
      default: begin
        SCK <= 0;
        if(TC) begin
          STATEREG <= st0;
          BUSY     <= 0;
        end else 
          STATEREG <= st1;
      end
      endcase
endmodule
VHDL

fsm3.vhd

library IEEE;
use IEEE.std_logic_1164.all;

entity FSM is
  port (
    CLK, SRST, LOAD, TC : in  std_logic;
    SCK, BUSY           : out std_logic);
end entity FSM;

architecture RTL of FSM is
  type   FSMTYPE is (st0, st1, st2, st3);
  signal STATEREG : FSMTYPE;
begin

  FSM : process (CLK) is
  begin
    if rising_edge(CLK) then
      if SRST = '1' then
        STATEREG <= st3;
        BUSY     <= '0';
        SCK      <= '1';
      else
        case STATEREG is
          when st3 =>
            STATEREG <= st1;
            BUSY     <= '1';
            SCK      <= '0';
          when st0 =>
            if LOAD = '1' then
              STATEREG <= st1;
              BUSY     <= '1';
            end if;
          when st1 =>
            STATEREG <= st2;
            SCK      <= '1';
          when st2 =>
            SCK <= '0';
            if TC = '1' then
              STATEREG <= st0;
              BUSY     <= '0';
            else
              STATEREG <= st1;
            end if;
        end case;
      end if;
    end if;
  end process FSM;
end architecture RTL;
Synthesis

The XST synthesis results, with default automatic FSM encoding, turned up some surprises. Both the Verilog and the VHDL code produced a one-hot FSM, with two additional registers for the SCK and BUSY outputs, as might be expected. However, the specific implementations differed:

Style 3 synthesis results
Mode SCK/BUSY Period LUT2 LUT3 LUT4 LUT5 LUT6 FDR FDS FDRS Total
fsm3.v Auto Reg/Reg 1.103 1 2 1 2 2 2 10
fsm3.vhd Auto Reg/Reg 1.298 1 1 2 1 3 2 1 11

On the face of it, this makes little sense; the only difference between the VHDL and the Verilog code is that the designer specified explicit state codings (0123) in the Verilog code, which were replaced by XST with 4-bit one-hot encodings. The synthesiser might be expected to ignore the initial codings. However, some experimentation showed that the initial codings were actually used by XST; different initial codings resulted in different one-hot codings, implementations, and period estimates.

There are 24 ways to specify the initial Verilog coding for this simple FSM. I tried 3 of these (the initial 0123 coding above, and two new codings of 1302 and 3210), and all three produced a period estimate of 1.103ns, using one-hot coding. However, the fourth (2013) produced an estimate of 1.042ns. Coding 2013 turned up another surprise: XST's auto mode did not use a one-hot encoding, but instead took 2013 as a user-defined coding, and produced a binary FSM with two additional registers for the SCK and BUSY outputs. The technology diagram showed 4 flip-flops and 4 LUTs.

Given this, I repeated synthesis of both sources for all 7 explicit extraction styles provided by XST, together with 'none' (6). The Verilog source was restored to an initial coding of 0123 (as in the source code above). The table below gives XST's estimated minimum period, together with XST's assigned coding when it produced a 2-bit FSM (in the 'Binary' column) and a 4-bit one-hot FSM, for all extraction styles:

Verilog VHDL
Coding Period, ns Binary One-hot Period, ns Binary One-hot
Auto 1.103 4281 1.298 1482
Compact 1.042 2130 1.042 0231
Sequential 1.042 2130 1.042 0231
Gray 1.110 3120 1.110 0321
Johnson 1.110 3120 1.110 0321
User 1.042 0123 1.042 0123
One-hot 1.103 4281 1.298 1482
Speed1 1.103 2418 1.298 8214
None 1.110 1.373

All the extraction styles which have an entry in the 'Binary' column produced a 2-bit state register, together with two additional bits for the SCK and BUSY outputs, giving a total of 4 flip-flops. The two styles which produced one-hot FSMs, and the 'None' style, produced 6 flip-flops.

One clear conclusion is that auto extraction may not produce the best results. Additionally, for at least some of the possible settings of the Verilog state parameters, the one-hot Verilog FSMs are faster than the corresponding VHDL one-hot FSM. The one-hot FSMs were only compared for 3 of the possible 24 settings of the Verilog parameters; it is possible (or even likely) that one of the other settings would produce a result which is equivalent to the VHDL result.

Finally, it is also clear that none of the synthesis results were as efficient as the Style #1 and Style #2 results (2 flip-flops and 2 LUTs, at 0.926ns). Since the required functionality is identical, it does seem likely that some modification of the Verilog and VHDL code could produce better results. However, it's not obvious what these modifications would be. Simply breaking the clocked process down into two clocked processes (Style #4) has no effect on the synthesis results.