Supported Document Formats
This appendix contains a list of the document formats supported by the Inso filtering technology. The following topics are covered in this appendix:
About Document Filtering Technology
Supported Document Formats
Unsupported Formats
About Document Filtering Technology
Oracle Text uses document filtering technology licensed from Stellent Chicago, Inc. This filtering technology enables you to index most document formats. This technology also enables you to convert documents to HTML for document presentation, with the CTX_DOC package.
See Also:
For a list of supported formats, see “Supported Document Formats” in this Appendix.
To use Inso filtering for indexing and DML processing, you must specify the INSO_FILTER object in your filter preference.
To use Inso filtering technology for converting documents to HTML with the CTX_DOC package, you need not use the INSO_FILTER indexing preference, but you must still set up your environment to use this filtering technology as described in this appendix.
To convert documents to HTML format, Inso filtering technology relies on shared libraries and data files licensed from Stellent Chicago, Inc.
The following sections discuss the supported platforms and how to enable Inso filtering on the different platforms.
Supported Platforms
Supported Platforms
Inso filter technology is supported on the following platforms:
Sun Solaris on SPARC 32-bit and 64-bit (2.5.1 – 2.6,7-8)
IBM AIX 32-bit and 64-bit (4.2 – 4.3)
HP-UX 32-bit and 64-bit (10.2 – 11.0)
DEC UNIX for Alpha/Tru64 UNIX (4.0)
SGI IRIX 32-bit and 64-bit (6.3)
Microsoft Windows
Intel x86 WinNT (4.0 and above)
Intelx86 Win95, Win98 SE, Win2000, and Windows ME
Red Hat Linux for Intel x86 (5.2 – 7.0)
Environment Variables
All environment variables related to Inso filtering must be made visible to Oracle Text.
Requirements for UNIX Platforms
The following requirements apply to Solaris, IBM AIX, HP/UX, Digital UNIX, SGI, and Linux platforms:
Ensure the *.flt files have execute permission granted to the operating system user running the Oracle database and ctxsrv server.
Set the $PATH variable to include the location of the *.flt files, in particular to the location of the file isunx2.flt, and to $ORACLE_HOME/ctx/lib which is the location of the shared libraries for Inso filtering
Set the $HOME environment variable to allow Inso technology to write files to a sub-directory (.oit) in $HOME directory.
Access to a running X-Windows server is required to perform vector graphics image conversion.
Filtering Vector Graphic Formats
Follow these steps to filter vector graphic formats on UNIX platforms:
Start an X server to filter vector graphic formats. If no X server exists (system detects no X libraries, such as Xm, Xt, and X11), vector graphic filtering is not performed. Vector graphic formats include CAD drawings and presentation formats such as Power Point 97. Bitmap formats include GIF, JPEG, and TIF formats as well as bitmap formats.
Because the system depends on X libraries to perform vector graphic conversion, ensure that the system-specific library path environment variable for the X libraries is set correctly.
Set the $DISPLAY environment variable. For example, setting DISPLAY=:0.0 tells the system to use the X server on the console.
OLE2 Object Support
There are platform dependent limits on what Inso filter technology can do with OLE2 objects. On all platforms when a metafile snapshot is available, Inso technology will use it to convert the object.
When a metafile snapshot is not available on UNIX platforms, Inso technology cannot convert the OLE2 object.
However, when a metafile snapshot is not available on the NT platform, the original application is used (if available) to convert the OLE2 object.
Supported Document Formats
The following table lists all of the document formats that Oracle Text supports for filtering. Document filtering is used for indexing, DML, and for converting documents to HTML with the CTX_DOC package. This filtering technology is based on Outside In HTML Export and Outside In Content Access technology licensed from Stellent Chicago, Inc.
Note:
This list does not represent the complete list of formats that Oracle is able to process. The external filter framework enables Oracle to process any document format, provided an external filter exists which can filter all the formats to plain text.
Word Processing – Generic
Format Version
ASCII Text (7 &8 bit versions)
All versions
ANSI Text (7 & 8 bit)
All versions
Unicode Text
All versions
HTML
Versions through 3.0 (some limitations)
IBM Revisable Form Text
All versions
IBM FFT
All versions
Microsoft Rich Text Format (RTF)
All versions
Word Processing – DOS
Format Version
DEC WPS Plus (WPL)
Versions through 4.1
DEC WPS Plus (DX)
Versions through 4.0
DisplayWrite 2 & 3 (TXT)
All versions
DisplayWrite 4 & 5
Versions through Release 2.0
Enable
Versions 3.0, 4.0 and 4.5
First Choice
Versions through 3.0
Framework
Version 3.0
IBM Writing Assistant
Version 1.01
Lotus Manuscript
Versions through 2.0
MASS11
Versions through 8.0
Microsoft Word
Versions through 6.0
Microsoft Works
Versions through 2.0
MultiMate
Versions through 4.0
Navy DIF
All versions
Nota Bene
Version 3.0
Office Writer
Version 4.0 to 6.0
PC-File Letter
Versions through 5.0
PC-File+ Letter
Versions through 3.0
PFS:Write
Versions A, B, and C
Professional Write
Versions through 2.1
Q&A
Version 2.0
Samna Word
Versions through Samna Word IV+
SmartWare II
Version 1.02
Sprint
Versions through 1.0
Total Word
Version 1.2
Volkswriter 3 & 4
Versions through 1.0
Wang PC (IWP)
Versions through 2.6
WordMARC
Versions through Composer Plus
WordPerfect
Versions through 6.1
WordStar
Versions through 7.0
WordStar 2000
Versions through 3.0
XyWrite
Versions through III Plus
Word Processing – International
Format Version
JustSystems Ichitaro
Version 5.0, 6.0, 8.0, 9.0, and 10.0
Word Processing – Windows
Format Version
AMI/AMI Professional
Versions through 3.1
Corel WordPerfect for Windows
Versions through 2002
JustWrite
Versions through 3.0
Legacy
Versions through 1.1
Lotus WordPro (NT on Intel only)
SmartSuite 96, 97, Millennium and Millennium 9.6
Lotus WordPro (all supported platforms except NT on Intel; Text only)
SmartSuite 97, Millennium, and Millennium 9.6
Microsoft Windows Works
Versions through 4.0
Microsoft Windows Write
Versions through 3.0
Microsoft Word 97
Word 97
Microsoft Word 2000
Word 2000
Microsoft Word 2002 (Office XP)
Word 2002
Microsoft Word for Windows
Versions through 7.0
Microsoft WordPad
All versions
Novell Perfect Works
Version 2.0
Novell WordPerfect for Windows
Versions through 7.0
Professional Write Plus
Version 1.0
Q&A Write for Windows
Version 3.0
Star Office Writer for Windows (Text only)
Version 5.2
WordStar for Windows
Version 1.0
Word Processing – Macintosh
Format Version
Microsoft Word
Versions 4.0 through 6.0
Microsoft Word 98
Word 98
WordPerfect
Versions 1.02 through 3.0
Microsoft Works
Versions through 2.0
MacWrite II
Version 1.1
Word Processing – Unix
Format Version
Star Office Writer for Windows
Version 5.2
Desktop Publishing
Format Version
Adobe FrameMaker
Version 6.0
Spreadsheets Formats
Format Version
Enable
Versions 3.0, 4.0 and 4.5
First Choice
Versions through 3.0
Framework
Version 3.0
Lotus 1-2-3 (DOS & Windows)
Versions through 5.0
Lotus 1-2-3 for SmartSuite
SmartSuite 97, Millennium, and Millennium 9.6
Lotus 1-2-3 Charts (DOS & Windows)
Versions through Millennium 9.6
Lotus 1-2-3 (OS/2)
Versions through 2.0
Lotus 1-2-3 Charts (OS/2)
Versions through 2.0
Lotus Symphony
Versions 1.0,1.1 and 2.0
Microsoft Excel 97
Excel 97
Microsoft Excel 2000
Excel 2000
Microsoft Excel 2002 (Office XP)
Excel 2002
Microsoft Excel Windows
Versions 2.2 through 7.0
Microsoft Excel Macintosh
Versions 3.0 – 4.0 and 98
Microsoft Excel Charts
Versions 2.x – 7.0
Microsoft Multiplan
Version 4.0
Microsoft Windows Works
Versions through 4.0
Microsoft Works (DOS)
Versions through 2.0
Microsoft Works (Mac)
Versions through 2.0
Mosaic Twin
Version 2.5
Novell Perfect Works
Version 2.0
QuattroPro for DOS
Versions through 5.0
QuttroPro for Windows
Versions through 2002
PFS:Professional Plan
Version 1.0
SuperCalc 5
Version 4.0
SmartWare II
Version 1.02
VP Planner 3D
Version 1.0
Databases Formats
Format Version
Access
Versions through 2.0
dBASE
Versions through 5.0
DataEase
Version 4.x
dBXL
Version 1.3
Enable
Versions 3.0, 4.0 and 4.5
First Choice
Versions through 3.0
FoxBase
Version 2.1
Framework
Version 3.0
Microsoft Windows Works
Versions through 4.0
Microsoft Works (DOS)
Versions through 2.0
Microsoft Works (Mac)
Versions through 2.0
Paradox (DOS)
Versions through 4.0
Paradox (Windows)
Versions through 1.0
Personal R:BASE
Version 1.0
R:BASE 5000
Versions through 3.1
R:BASE System V
Version 1.0
Reflex
Version 2.0
Q & A
Versions through 2.0
SmartWare II
Version 1.02
Display Formats
Format Version
PDF – Portable Document Format
Acrobat Versions 2.1, 3.0, 4.0, and 5.0 including Japanese PDF.
Presentation Formats
Format Version
Corel Presentations
Versions 8.0, 9.0 and 2002
Novell Presentations
Versions 3.0 and 7.0
Harvard Graphics for DOS
Versions 2.x & 3.x
Harvard Graphics
Windows versions
Freelance 96
Freelance 96
Freelance for Windows
SmartSuite 97, Millennium, and Millennium 9.6
Freelance for Windows
Version 1.0 and 2.0
Freelance for OS/2
Versions through 2.0
Microsoft PowerPoint for Windows
Versions through 7.0
Microsoft PowerPoint 97
PowerPoint 97
Microsoft PowerPoint 2000
PowerPoint 2000
Microsoft PowerPoint 2002 (Office XP)
PowerPoint 2002
Microsoft PowerPoint for Macintosh
Version 4.0 and 98
Standard Graphic Formats
The following table lists the graphic formats that the INSO filter recognizes. This means that indexing a text column that contains any of these formats produces no error. As such, it is safe for the column to contain any of these formats.
Note:
The INSO filter cannot extract textual information from graphics.
Format Version
Binary Group 3 Fax
All versions
BMP (including RLE, ICO, CUR & OS/2 DIB)
Windows
CALS Raster
Type 1 and II
CDR (if TIFF image is embedded in it)
Corel Draw version 2.0 – 9.0
CGM – Computer Graphics Metafile
ANSI, CALS, NIST, Version 3.0
DCX (multi-page PCX)
Microsoft Fax
DRW – Micrografx Designer
Version 3.1
DRW – Micrografx Draw
Version 4.0
DXF (Binary and ASCII) AutoCAD Drawing Interchange Format
Versions through 14
EMF
Windows Enhanced Metafile
EPS – Encapsulated PostScript
If TIFF image is embedded in it
FPX – Kodak Flash Pix
No specific version
GIF – Graphics Interchange Format
Compuserve
GP4 – Group 4 CALS format
Types I and II
HPGL – Hewlett Packard Graphics Language
Version 2.0
IMG – GEM Paint
No specific version
JFIF (JPEG not in TIFF)
All versions
JPEG
All versions
Novell Perfect Works (Draw)
Novell version 2.0
PBM – Portable Bitmap
No specific version
PCD – Kodak Photo CD
Version 1.0
PCX Bitmap
PC Paintbrush
PGM – Portable Graymap
No specific version
PIC
Lotus 1-2-3 Picture File Format – No Specific Version
PICT1 & PICT2 (Raster)
Macintosh Standard
PNG – Portable Network Graphics Internet Format
Version 1.0
PNTG
MacPaint
PPM – Portable Pixmap
No specific version
Progressive JPEG
No Specific version
PSP – Paintshop Pro (NT on Intel only)
Versions 5.0 and 5.0.1
SDW
Ami Draw
Snapshot (Lotus)
All versions
SRS – Sun Raster File Format
No specific version
Targa
Truevision
TIFF
Versions through 6
TIFF CCITT Group 3 & 4
Fax Systems
VISO
Visio 4 (Page Preview only), 5, 2000, 2002
WBMP
No Specific version
WMF
Windows Metafile
WordPerfect Graphics [WPG and WPG2]
Versions through 2.0
XBM – X-Windows Bitmap
x10 compatible
XPM – X-Windows Pixmap
x10 compatible
XWD – X-Windows Dump
x10 compatible
Other
Format Version
Executable (EXE, DLL)
No specific version
Executable for Windows NT
No specific version
Microsoft Project (Text only)
Project 98
MSG (Text only)
Microsoft Outlook mail format
vCard Electronic Business Card
Versit version 2.1
WML
Compatible with version 5.2
Unsupported Formats
Password protected documents and documents with password protected content are not supported by the Inso filter.
Leave A Comment?