Vision Language Models: Connecting Image Encoders to LLMs