I think it's simply difficult to provide realtime audio without ASIO.
Windows audio (DXi or however it is called) is not designed for that and often works just with 1024 samples buffer size. Going below get's usually tough.
The usual guess is that total latency should be below 
10ms to be not too disturbing.
1024samples @44.1kHz sampling rate calculates to over 
23ms!
128samples @44.1kHz sampling rate results in about 
3ms.
Usually there is a bit additional system and midi latency too that is difficult to rate or meter depending on what one is using. I don't think more than 
256sample @44.1kHz buffer size is acceptable for realtime playing.
And I frequently read that drummers can't even go with 3ms latency and want less... 
 
All depends on if one wants just to playback audio or to play it keyboard style as well.