How to read OpenOffice OpenDocument spreadsheets in Perl

There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)

The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC =Methods=

ODF::lpOD module

 * Details: ODF::lpOD at CPAN


 * Opening a spreadsheet:
 * $doc = odf_document->get($filename);


 * Main sheet selection function:
 * $sheet = $doc->get_body->get_table($tablename);


 * Main function to get read/write access to a cell:
 * $cell = $table->get_cell(8, 2);


 * $cell = $table->get_cell("C9");


 * Main function for grabing a cell value (two addressing syntaxes):
 * $cell = $table->get_cell_value(8, 2);
 * $cell = $table->get_cell_value("C9");

use ODF::lpOD; $file = 'chipList-v2.14.ods'; my $doc = odf_document->get($file); my $content = $doc->get_body; my $sheet = $content->get_table("My Chips"); my $res = $sheet->get_cell_value("A1"); print $res;
 * Example for reading a cell:

OpenOffice::OODoc module

 * Details:OpenOffice::OODoc at CPAN

use OpenOffice::OODoc; my $doc = odfDocument(file => 'chipList-v2.14.ods'); print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1
 * Opening a spreadsheet:
 * $doc = odfDocument(file => $filename);
 * Main function for grabing a cell:
 * $doc->getCellValue($sheet, $row, $col)
 * note variables start at 0. i.e. the first sheet, first row, first column is (0, 0, 0)
 * Example for reading a cell:

OpenOffice::OOSheets module
note this example has a problem on my computer
 * Details:OpenOffice::OOSheets at CPAN

use OpenOffice::OOSheets; use Archive::Zip; $file = 'chipList-v2.14.ods'; my $zip = Archive::Zip->new($file); my $content=$zip->contents('content.xml'); my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=>                [                   {                     'cells' => [                                'A1'                              ],                     'table' => 'My Chips'                   },                 ]); #Reads sheet named 'My Chips', cell A1 print $res;
 * Example for reading a cell:

Spreadsheet::Read module

 * Details: Spreadsheet::Read at CPAN


 * This module uses OpenOffice/Excel cell naming format.
 * Example: row 1, column 1 is A1
 * There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
 * $cell=cr2cell ( 1, 1) # $cell is = 'A1'

use Spreadsheet::Read; my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods"); print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1 print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)
 * Example for reading a cell:

Custom Subroutine, parse_ods
This function is much faster than other modules


 * Usage:
 * use Archive::Zip;
 * @spreadsheet_data = &parse_ods($spreadsheet_file_name);
 * $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
 * sheet, row, column start at 0

sub parse_ods { my $file = $_[0]; my @data; my $zip = Archive::Zip->new($file); my $content=$zip->contents('content.xml'); $content =~ s/<\?xml .*\?>\n//; $content =~ s/^.*//; $content =~ s/<\/office:spreadsheet>.*$//; $content =~ s/.*<\/table:named-expressions>//; $content =~ s/.*<\/table:database-ranges>//; #keep greedy? $content =~ s/<\/table:table-cell>//; #maybe not neeeded $content =~ s/<\/table:table-row>//; #maybe not neeeded my @sheets = split(//,$content); shift(@sheets); #skips the first _blank_ element my $sheet_num=0; foreach my $sheet (@sheets) { my $row_num=-1; foreach my $row (split(//,$sheet)) { if ($row_num != -1) { #skips first element my $col_num=-1; foreach my $cell ( split(// ) { $cell_content =~ s/.*//; $cell_content =~ s/<\/text:p>.*//; } else { # no content in cell $cell_content = ''; }						if ($cell =~ /number-columns-repeated/) { # repeat cells contents n-times my $col_repeated = $cell; # Original: $col_repeated =~ s/table:number-columns-repeated="//;							$col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better $col_repeated =~ s/".*//;							foreach (1..$col_repeated) {								$data[$sheet_num][$row_num][$col_num]=$cell_content;								$col_num++;							}						} else {							# no repeated cols							$data[$sheet_num][$row_num][$col_num]=$cell_content;							$col_num++;						}					}				}			}			$row_num++		}		$sheet_num++;	}	return @data; }

=Comparison= A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells

As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods, is much much faster, but can only be used to read a file, not write or modify.