Modification of the dataset analysed in Penrose et al. (1985). Lists estimates of the percentage of body fat determined by underwater weighing and various body measurements for 246 men.

bodyfat

Format

A data frame with 246 rows and 14 columns:

bodyfat

Percent body fat (from Siri's 1956 equation)

age

Age (years)

weight

Weight (kg)

height

Height (cm)

neck

Neck circumference (cm)

chest

Chest circumference (cm)

abdomen

Abdomen circumference (cm)

hip

Hip circumference (cm)

thigh

Thigh circumference (cm)

knee

Knee circumference (cm)

ankle

Ankle circumference (cm)

biceps

Biceps (extended) circumference (cm)

forearm

Forearm circumference (cm)

wrist

Wrist circumference (cm)

Source

StatLib Datasets Archive: https://lib.stat.cmu.edu/datasets/bodyfat.

Details

This data set can be used to illustrate multiple regression techniques (e.g. Johnson 1996). Instead of estimating body fat percentage from body density, which is not easy to measure, it is desirable to have a simpler method that allow this to be done from body measurements.

bodyfat.raw contains the original data. According to Johnson (1996), there were data entry errors (cases 42, 48, 76, 96 and 182 of the original data) and he suggested some rules to correct them. These outliers were removed in the bodyfat dataset, as well as an influential observation (case 39, which has a big effect on regression estimates). Additionally, the variable density was dropped for convenience, and variables height and weight were transformed into metric units (centimetres and kilograms) for consistency.

See bodyfat.raw for more details.

References

Johnson, R. W. (1996). Fitting Percentage of Body Fat to Simple Body Measurements. Journal of Statistics Education, 4(1). doi:10.1080/10691898.1996.11910505 .

Penrose, K., Nelson, A. and Fisher, A. (1985). Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques. Medicine and Science in Sports and Exercise, 17(2), 189. doi:10.1249/00005768-198504000-00037 .

See also

Examples

fit <- lm(bodyfat ~ abdomen, bodyfat)
summary(fit)
#> 
#> Call:
#> lm(formula = bodyfat ~ abdomen, data = bodyfat)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10.9377  -3.5413   0.1526   3.1426  12.7569 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -42.56252    2.75889  -15.43   <2e-16 ***
#> abdomen       0.66779    0.02967   22.51   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 4.699 on 244 degrees of freedom
#> Multiple R-squared:  0.675,	Adjusted R-squared:  0.6736 
#> F-statistic: 506.7 on 1 and 244 DF,  p-value: < 2.2e-16
#> 
plot(bodyfat ~ abdomen, bodyfat)
abline(fit)