Below we have a flat, 2-dimensional test card to shoot with a stereo camera rig.

If we converge (toe-in) the cameras, we get keystone distortion.
As you can see in the following example, there is vertical disparity, this will likely cause eyestrain.
Additionally, the horizontal screen disparity has been distorted, it no longer has a 1:1 relationship with the distance of the subject(s) from the camera.

If we instead shoot with parallel cameras, we have no vertical disparity, and consistent horizontal disparity, as you would expect from a flat object that is equi-distant from the cameras.

We can account for the excess image on the left and right by cropping in post, or using shift lenses when we shoot. In doing so, we z-position the depth where the image will be perceived.
Here is another image to illustrate the point. Here you can see how, when using toe-in, the cameras frustums will never converge to form a valid stereo window.
This folowing setup simulates use of shift lenses. You can see how the frustums converge at the desired position of the stereo window.
