I'm probably missing something here, but a couple of comments. Though the physiological response at the input end (the retina, the primary visual cortex (PVC)) will be the same to duck or rabbit, the response in the higher processing regions will be different. There are indeed 'mental images' that correspond (at the retina and PVC) to the physical images in a one to one fashion. At a higher level, the retinal image is broken apart for processing in different regions that detect, edges, color, movement, etc. By some mysterious synthesis the right combination of color, edges, etc. is recognized as a rabbit or a duck. That recognition too is a physiological response (except to an unreconstructed Cartesian dualist). This is, in a way, an act of imagination, but there is no other way of imposing meaning on what we see.