Supported File Formats
We support an extensive array of file formats to ensure broad compatibility and efficient information extraction for a variety of document types. Our continually expanding list of supported formats includes:
- Word Documents:
.doc
,.docx
,.docm
,.dot
,.dotm
- Open Document Format:
.odt
,.ott
- eBooks:
.epub
- Presentations:
.ppt
,.pptx
- Spreadsheets:
.xls
,.xlsx
- Miscellaneous: .rtf, .xps, .pcl, .md, .flatopc, .pdf, .txt
- Email Formats:
.pst
,.msg
,.eml
,.emlx
Our team is dedicated to improving our parsers, aiming to deliver a parsing quality that surpasses that of any open-source solution, particularly for documents with complex structures and tables. We are currently focusing our efforts on enhancing the parsing capabilities for .docx
(Microsoft Word) and Adobe .pdf
files, given their widespread use.
For optimal results, we recommend using the .docx
format.
Special Focus on Tables
Recognizing the importance of tables in documentation, especially for our enterprise customers, we understand that most existing chatbot solutions struggle with accurate table parsing. The Varex platform places a special emphasis on table parsing, ensuring the highest quality responses for queries related to data within tables. Our commitment is to provide reliable and precise information retrieval from structured data, enhancing the user experience.