OpenXML curiosity

aaronrumple · February 5, 2025, 3:01pm

Just a curious inquiry…
Wrote some code for importing a bunch of data from Excel and a bunch of fields were in the form of 10.2.3 or 10.2 as text. Most of the data came in just fine. However, a few - just a few - cam in 10.199999999999. I tracked it down finally to someone in Excel had typed in 10.2< space >. OpenXML somehow took the space as a number and subtracted it from the 10.2. Just curious as to how the math is working on that. Space is int -16. Looks like some bit or binary operation going on.

sanzoghenzo · February 5, 2025, 11:08pm

It’s not a subtraction, it’s how floating point numbers are stored in memory, plus the funky way Excel handles numbers, even if you tell it to handle them as text.

If you want to get the same output as in Excel, yoy have to get the number formatting from the cell, and apply it to the value (you’ll have to create a mapping between how a number fomatting is declared in excel and how it is done with the str.format() function).

If you need the actual value and not a string representation, you have to take into account that floats are not as precise as you would want and add lrounding and tolerances in your calculations.

May I ask you why you’re using openxml? My choices for reading Excel are xlrd on pyRevit/IronPython, pandas with openpyxl on CPython, ClosedXML for pure .NET (way less complicated than openXML).

aaronrumple · February 6, 2025, 1:42am

Thanks for the insight. I thought I had done the str.format() correctly.
Thing is the cell is formatted in Excel as text. But doesn’t use the '10.2 text qualifier. Will look at the issue closer.
Yes - I’m using what is on our systems without trying to get IT involved. I’ve used both interop and OpenXML. Pandas, numpy, etc all require IT to look it over and distribute and stay on top of.
Note that they have spent a solid day trying to install pyRevit 5. And it still doesn’t run. (I did it here at home in a minute.) Hence, I keep everything low level as possible with what is on the default install on all the computers. Yes, I’d much rather use Pandas.
Simple solution was to send the Excel author back into Excel and remove spaces - along with 50+ typos. Just can’t fix that with 1’s and 0’s.

aaronrumple · February 6, 2025, 4:30pm

It is just Excel and OpenXML. Just bit conversion.
Conversion of decimal seems quite random.

    <Cell ss:StyleID="s63"><Data ss:Type="String" x:Ticked="1">8.2</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.23</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.1999999999999993</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.5</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.6</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.7100000000000009</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.7200000000000006</Data></Cell>
    <Cell ss:StyleID="s62"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s63"><Data ss:Type="Number">8.73</Data></Cell>

sanzoghenzo · February 6, 2025, 4:51pm

It seems, but it is not… check this thorough explanation

aaronrumple · February 6, 2025, 5:01pm

Yes - thanks - understand the issues or 32, 64 bit and all. Just thought that the Excel to OpenXML standard would have taken some of that into account when it did its translation to decimal. Knowing it doesn’t, I can take that into account (Or get users to actually format cells that are really text as text.)

sanzoghenzo · February 6, 2025, 5:42pm

Numbers in the excel cell are already floating points (unless you forced them to text), so they already have that inaccuracy before being serialized to XML.
Rounding them might be fine in your specific case, but excel is also used for complex formulas with many significative decimal places… what should excel do in that case?