It's really hard to follow, because there's no consistent scene composition. In the beginning, its clear that he's in the nurse's office, and we can tell where everything is in relation to itself. But once he leaves that office, everything starts becoming scrambled because of the lack of animation, lack of contextual background assets, and lack of direction.
It's not animated, and it's not even a flip-book, it is a slideshow of shots in this state.
Here's an example, at 1:06 he's entering the men's room. It's a close-up of the door that says MEN'S, and we're looking at it through a POV of Malcolm entering from the bottom right.
-Next frame-
Malcolm is standing against a grey background, presumably the interior of the bathroom, in a medium shot, looking to stage left. In this sequence, the flow of motion is established from stage bottom right to stage left. Things are moving to the left. What could help clarify that we're in the interior of a bathroom is if the background had sinks & mirrors, towel dispensers, the other side of the same door, a trash bucket overflowing with crumpled paper towels, stalls, or anything other than just a flat color. Hell, tilework. Graffitti. It's a blank canvas for you to decorate.
-Next frame-
In an abstract art piece, we have a background that's 2/3rds gray and 1/3rd blue, separated by a black line. This reads, contextually, that a guy is talking from behind a closed stall because he asks "Does anyone have toilet paper?!" His speech bubble is moving to the left of the dividing line, but it isn't clear where this stall is in relation to Malcolm. The audience has to guess and it's a 50/50 chance that we think things are still moving to the left, or we notice that "Oh this video game level's background is grey, so the stall must be blue, therefore to the right of him"
-Next frame-
Our hero calls out "NO" over his shoulder to the screen right. If people guessed correctly that the stall was to the right, we're still in the scene. But if they guessed it was to the left of Malcolm, this breaks the 180 rule, and it disorients them. What could've helped it become clearer is if the stall walls were articulated, like you drawn the top of them at the very least so that we can say "Oh, it's a stall, and it's over there."
Eventually there's shenanigan's involving an air vent, and the direction gets decidedly muddled in the sequence.
If the audience was being led to believe that things are moving in a consistent direction, we have an idea of where everything is in relation to our character. Suddenly that's proven to be wrong, or we're blindsided by a weird compositional choice, and without motion to help convey the flow of direction, we're going to get lost. Without recognizable landmarks, we're also going to get lost. If that keeps happening over and over again, your audience loses interest and they're just like "whatever, I guess this thing wasn't made well or something" and this gets forgotten.
Look into the 180 rule, add motion to characters and camera alike, and detail to your environments to help the audience understand where they are in relation to your character and their journey.