One more thing.
while ((uint32_t)(TCNT0 - pulse_start) < STEP_PULSE_CYCLES - CYCLES_EATEN_BY_CODE) { /* nada */ }
but time counter is never reset, so, it can actually overflow during ISR and this comparison will not hold true for some time
while ((uint32_t)(TCNT0 - pulse_start) < STEP_PULSE_CYCLES - CYCLES_EATEN_BY_CODE) { /* nada */ }
but time counter is never reset, so, it can actually overflow during ISR and this comparison will not hold true for some time