There are many ways to open OpenOffice.org spreadsheets in Perl. The default spreadsheet file format for OpenOffice is OpenDocument spreadsheet (*.ods)
The following should work with any OpenDocument spreadsheets and should be compatible with OpenOffice 2.x-3.x It has been tested with OpenOffice 3.0.0. They are not likely to work with OpenOffice 1.x spreadsheets (*.sxc). For version 1.x OpenOffice files see Spreadsheet::ReadSXC
Methods[]
ODF::lpOD module[]
- Details: ODF::lpOD at CPAN
- Opening a spreadsheet:
- $doc = odf_document->get($filename);
- Main sheet selection function:
- $sheet = $doc->get_body->get_table($tablename);
- Main function to get read/write access to a cell:
- $cell = $table->get_cell(8, 2);
- $cell = $table->get_cell("C9");
- Main function for grabing a cell value (two addressing syntaxes):
- $cell = $table->get_cell_value(8, 2);
- $cell = $table->get_cell_value("C9");
- Example for reading a cell:
use ODF::lpOD; $file = 'chipList-v2.14.ods'; my $doc = odf_document->get($file); my $content = $doc->get_body; my $sheet = $content->get_table("My Chips"); my $res = $sheet->get_cell_value("A1"); print $res;
OpenOffice::OODoc module[]
- Details:OpenOffice::OODoc at CPAN
- Opening a spreadsheet:
- $doc = odfDocument(file => $filename);
- Main function for grabing a cell:
- $doc->getCellValue($sheet, $row, $col)
- note variables start at 0. i.e. the first sheet, first row, first column is (0, 0, 0)
- $doc->getCellValue($sheet, $row, $col)
- Example for reading a cell:
use OpenOffice::OODoc; my $doc = odfDocument(file => 'chipList-v2.14.ods'); print $doc->getCellValue(0, "A1"); #Reads first sheet, cell A1
OpenOffice::OOSheets module[]
- Details:OpenOffice::OOSheets at CPAN
note this example has a problem on my computer
- Example for reading a cell:
use OpenOffice::OOSheets; use Archive::Zip; $file = 'chipList-v2.14.ods'; my $zip = Archive::Zip->new($file); my $content=$zip->contents('content.xml'); my $res=OpenOffice::OOSheets::GetData (text=>$content,ref=> [ { 'cells' => [ 'A1' ], 'table' => 'My Chips' }, ]); #Reads sheet named 'My Chips', cell A1 print $res;
Spreadsheet::Read module[]
- Details: Spreadsheet::Read at CPAN
- This module uses OpenOffice/Excel cell naming format.
- Example: row 1, column 1 is A1
- There is a function included in the module so u can input a row and column number instead of the alphanumeric designation. note it starts with row 1 not row 0.
- $cell=cr2cell ( 1, 1) # $cell is = 'A1'
- Example for reading a cell:
use Spreadsheet::Read; my $chipListSpreadsheet = ReadData ("chipList-v2.14.ods"); print $chipListSpreadsheet->[1]{A1}; #Prints sheet 1, cell A1 print $chipListSpreadsheet->[1]{cr2cell ( 1, 1)}; #also prints A1, but in form (column=1, row=1)
Custom Subroutine, parse_ods()[]
This function is much faster than other modules
- Usage:
- use Archive::Zip;
- @spreadsheet_data = &parse_ods($spreadsheet_file_name);
- $content = $spreadsheet_data[$sheet_number][$row_number][$column_number];
- sheet, row, column start at 0
sub parse_ods { my $file = $_[0]; my @data; my $zip = Archive::Zip->new($file); my $content=$zip->contents('content.xml'); $content =~ s/<\?xml .*\?>\n//; $content =~ s/^.*<office:spreadsheet>//; $content =~ s/<\/office:spreadsheet>.*$//; $content =~ s/<table:named-expressions>.*<\/table:named-expressions>//; $content =~ s/<table:database-ranges>.*<\/table:database-ranges>//; #keep greedy? $content =~ s/<\/table:table-cell>//; #maybe not neeeded $content =~ s/<\/table:table-row>//; #maybe not neeeded my @sheets = split(/<table:table .*?>/,$content); shift(@sheets); #skips the first _blank_ element my $sheet_num=0; foreach my $sheet (@sheets) { my $row_num=-1; foreach my $row (split(/<table:table-row .*?>/,$sheet)) { if ($row_num != -1) { #skips first element my $col_num=-1; foreach my $cell ( split(/<table:table-cell/,$row )) { if ( $col_num == -1 ) { #skips first element $col_num++; } else { my $cell_content = $cell; if ( $cell_content =~ /<text:p>/ ) { $cell_content =~ s/.*<text:p>//; $cell_content =~ s/<\/text:p>.*//; } else { # no content in cell $cell_content = ''; } if ($cell =~ /number-columns-repeated/) { # repeat cells contents n-times my $col_repeated = $cell; # Original: $col_repeated =~ s/table:number-columns-repeated="//; $col_repeated =~ s/.*table:number-columns-repeated="//; # this worked better $col_repeated =~ s/".*//; foreach (1..$col_repeated) { $data[$sheet_num][$row_num][$col_num]=$cell_content; $col_num++; } } else { # no repeated cols $data[$sheet_num][$row_num][$col_num]=$cell_content; $col_num++; } } } } $row_num++ } $sheet_num++; } return @data; }
Comparison[]
A comparison of time to grab a single cell from a spreadsheet with 357 rows and 46 columns, with a file size of 42k. Tests were done reading one cell and reading 600 cells
Method | Time to read 1 cell | Time to read 600 cells |
---|---|---|
Unzip and read xml no processing |
0.01s | NA |
OpenOffice::OODoc | 1.8s | 24.9s |
OpenOffice::OOSheets | ? | ? |
Spreadsheet::Read | 0.88s | 0.88s |
parse_ods() | 0.08 | 0.08 |
As you can see Spreadsheet::Read is faster than OpenOffice::OODoc at grabbing a cell, and much faster at grabbing many cells. The custom function, parse_ods(), is much much faster, but can only be used to read a file, not write or modify.