Interesting you mention harsh audio, because that is not the thing I would have latched on to upon first listen. Can you elaborate why you consider the audio harsh?
As ranjit said above, extracting the audio from a video should not make it harsh. That is, unless the audio recorded along the video was harsh to begin with. Looking more closely at the audio of the file, it appears there is about 5.6 dB of headroom, which is well within a safety margin for clipping.
What I liked about your interpretation include, the overall pacing and balance of perspective of the contrasting sections. As well, I found your choice of highlighting certain melodic lines well done. It was immediately clear which melodic lines you wanted to draw the listener's attention to.
IMO, work more at deciding where breath and break points of the phrases occur. There were a couple places where I thought that a phrase came to a conclusion, yet you did not include a breath, or enough of a breath.
Bar 88-94 try to shape your releases more. The releases sound unusually dry here, and don't really fit with what you are doing previous or following this section. Even if you intended this to be a contrast, I think you should find a way to make it flow more with surrounding material.
Bar 222 I get that you are trying to do terraced dynamics here, but I think the pull back was too much, and broke the flow to my ears. Find a way to maintain forward momentum and intensity when pulling back.