PakMediNet Discussion Forum : Biostatistics : What is wrong with this
I am having a problem with calculating mean. To better illustrate it, here is a simpler version of my problem.
Lets say that I have following temperatures recorded in Fahrenheit
temp.F = 41, 50, 59, 68, 77, 86, 95, 104, 113, 122, 131, 140, 149, 158
with a mean of 99.5F and logarithm of this mean will be 4.60.
Now lets say that I take logarithm of these individual measurements which will be
Log (temp.F) = 3.713572, 3.912023, 4.077537, 4.219508, 4.343805, 4.454347, 4.553877, 4.644391, 4.727388, 4.804021, 4.875197, 4.941642, 5.003946, 5.062595
with a mean of 4.524.
Note that this value of mean is different from the above value of mean obtained by converting calculated mean into its logarithm. This difference is more easily noticeable if we take anti-log of 4.524 which is 92.2F and quite different from 99.5F.
I would think that both methods should give the same answer. Why there is discrepancy?
Can someone explain me why there is a difference between the mean calculated directly from logarithms of original values and the logarithm of mean calculated from original values?
Posted by: rqayyumPosts: 199 :: 30-05-2006 :: | Reply to this Message
It is not big deal. Arithmetic mean (ie sum of observations divided by # of observations) of logarithim corresponds to the Geometric Mean of the corresponding raw numbers (yours temperature recorded in Farenheit).
So essentially you have calculated two kinds of means. In the original units it is arithmetic mean. When you take log of original observations and divide by number of observations and at the end you take antilog, it's Geometric mean. Please note that Arithmetic mean will always be greater than Geometric Mean (if observations are not same). You may find following site helpful for calculation and comments: http://www.buzzardsbay.org/geomean.htm
Take care
Anwer Khurshid
Posted by: anwer_khurPosts: 30 :: 03-06-2006 :: | Reply to this Message
Thank you very much. It was really helpful.
I am trying to understand whether transformation of data affects the results. For example, it is not uncommon to transform data to log scale before applying statistical tests such as, linear regression. After regression, data is, generally, transformed back to the base 10 scale. Basically, what I am finding is that we can't depend on transformation. Is it right?
Here are two more examples from the same (above) data.
Lets take the square of the above data which becomes:
temp.sq = 1681, 2500, 3481, 4624, 5929, 7396, 9025, 10816, 12769, 14884, 17161, 19600, 22201, 24964.
its mean = 11217
taking square root of the mean to convert it back to original scale = 105.9
Again this is different from the mean of original data which is 99.5.
Lets now take inverse of the data, that is, 1/x
temp.inv = 0.024390, 0.020000, 0.016949, 0.014706, 0.012987, 0.011628, 0.010526, 0.009615, 0.008850, 0.008197, 0.007634, 0.007143, 0.006711, 0.006329.
its mean = 0.01183
taking inverse of this mean to convert it back to original scale = 84.5
Once again this is different from the mean of the original data which is 99.5.
I am sure that there is something that I am missing. I simply don't know what it is.
Posted by: rqayyumPosts: 199 :: 03-06-2006 :: | Reply to this Message
Sometimes we can come across some examples of data for which a linear regression model is not appropriate: a residual analysis would suggest that one or more of the assumptions of the linear regression model were broken. We may recall that the linear regression model assumes the following:
Independence:
The response variables are independent.
Normality:
The response variables are normally distributed.
Homoscedasticity:
The response variables all have the same variance .
Linearity:
The true relationship between the mean of the response variable and the explanatory variables is a straight line.
The necessity to transform data may arise under the conditions of non-independence or non-normality (in most cases). Data transformation seems like a lot of manipulation at a first glance, but it just involves placing the data on another scale.
As you have written you are transforming the data. So the question is why and what is transformation? By transformation we mean "a change in the scale for the values of a variable obtained by using some mathematical operations". Sometimes transformations are performed to simplify calculations. Frequently, transformations are made so that transformed data can satisfy the assumptions underlying a given statistical procedure.
Following is a brief summary of three commonly used transformations (which you have mentioned).
1. Logarithmic transformation: It is used when (a) the variances are not equal (heterogeneity of variances), (b) standard deviations are proportional to the means (CV's are equal), (c) when the data is positively skewed.
Procedure:
Step 1: Convert raw data into their logarithms by or depending on the data.
Step 2: Perform analysis on log data.
Step 3: Convert back into units of the raw data by taking the antilog of the results.
{Taking logarithms of the sample values (i.e., transforming the sample), finding the arithmetic mean of the logs, and then retransforming back to the original scale (by taking antilogs), the result is the sample geometric mean}.
2. Squared Transformation: It is used when (a) standard deviation decreases as the mean increases, (b) when the data is negatively skewed.
Procedure:
Convert raw data into squared transformation by
3. Reciprocal Transformation: It is used when standard deviation is proportional to the square of the mean.
Procedure:
Convert raw data into reciprocal transformation by or (to avoid zero in original data)
{Taking reciprocals of the sample values (i.e., transforming the sample), finding the arithmetic mean of the reciprocals, and then retransforming back to the original scale the result is the sample harmonic mean which is sometimes used to average rates.}
Posted by: anwer_khurPosts: 30 :: 05-06-2006 :: | Reply to this Message
Thank you Prof. Khurshid. This detail is very helpful and makes quite a few things clearer.
What is surprising me is that transforming data (any data - normal or non-normal) results in different mean than a non-transformed data, i.e. mean is not stable. I am wondering, is it that I don't understand something or that once data goes through transformation its mean is not the same. The reason, I am spending so much time on it is that I have noticed, many parametric tests use mean. If mean is not stable after transformation of data, should we really be transforming data?
Posted by: rqayyumPosts: 199 :: 05-06-2006 :: | Reply to this Message
Thanks for continuing the discussion: The question is whether back transformation is legitimate or real thing? Like most questions in Statistics, the answer is "it depends...". Strictly speaking, the back transformation is valid and useful for interpretation because it returns data to the original measurement scale. However, once data have been transformed, interpretation of what the transformed (or backtransformed) mean, regression coefficients, CI's and differences among means represent requires special care and is not necessarily intuitive. In short, the old caution applies: if you transform data to meet the assumptions of a statistical test, (medical, biological) interpretation of the output should be made with care.
A Word of warning: With log and other non-linear transformations, the back-transformed mean of the transformed variable will never be the same as the mean of the original raw variable. Log transformation yields the so-called geometric mean of the variable, which isn't easily interpreted.
Posted by: anwer_khurPosts: 30 :: 06-06-2006 :: | Reply to this Message
Thank you
Posted by: rqayyumPosts: 199 :: 06-06-2006 :: | Reply to this Message
What we have discussed so far deal with nonlinear transformation (log, reciprocal, square). However in the case of linear transformation (for example conversion of Centigrade to Fahrenheit 9/5 C +32 =F) you will get exactly the same mean for untransformed as well as transformed data.
Posted by: anwer_khurPosts: 30 :: 07-06-2006 :: | Reply to this Message