Popular dataset originally analysed in Penrose et al. (1985). Lists estimates of the percentage of body fat determined by underwater weighing and various body measurements for 252 men.

bodyfat.raw

## Format

A data frame with 252 rows and 15 columns:

density

Density (gm/cm^3; determined from underwater weighing)

bodyfat

Percent body fat (from Siri's 1956 equation)

age

Age (years)

weight

Weight (lbs)

height

Height (inches)

neck

Neck circumference (cm)

chest

Chest circumference (cm)

abdomen

Abdomen 2 circumference (cm)

hip

Hip circumference (cm)

thigh

Thigh circumference (cm)

knee

Knee circumference (cm)

ankle

Ankle circumference (cm)

biceps

Biceps (extended) circumference (cm)

forearm

Forearm circumference (cm)

wrist

Wrist circumference (cm)

## Source

StatLib Datasets Archive: https://lib.stat.cmu.edu/datasets/bodyfat.

## Details

This data set can be used to illustrate data cleaning and multiple regression techniques (e.g. Johnson 1996). Percentage of body fat for an individual can be estimated from body density, for instance by using Siri's (1956) equation: $$bodyfat = 495/density - 450.$$ Volume, and hence body density, can be accurately measured by underwater weighing (e.g. Katch and McArdle, 1977). However, this procedure for the accurate measurement of body fat is inconvenient and costly. It is desirable to have easy methods of estimating body fat from body measurements.

"Measurement standards are apparently those listed in Benhke and Wilmore (1974), pp. 45-48 where, for instance, the abdomen 2 circumference is measured 'laterally, at the level of the iliac crests, and anteriorly, at the umbilicus'.

Johnson (1996) uses the original data in an activity to introduce students to data cleaning before performing multiple linear regression. An examination of the data reveals some unusual cases:

• Cases 48, 76, and 96 seem to have a one-digit error in the listed density values.

• Case 42 appears to have a one-digit error in the height value.

• Case 182 appears to have an error in the density value (as it is greater than 1.1, the density of the "fat free mass"; resulting in a negative estimate of body fat percentage that was truncated to zero).

Johnson (1996) suggests some rules for correcting these values (see examples below).

## References

Johnson, R. W. (1996). Fitting Percentage of Body Fat to Simple Body Measurements. Journal of Statistics Education, 4(1). doi:10.1080/10691898.1996.11910505 .

Penrose, K., Nelson, A. and Fisher, A. (1985). Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques. Medicine and Science in Sports and Exercise, 17(2), 189. doi:10.1249/00005768-198504000-00037 .

Siri, W. E. (1956). Gross Composition of the Body, in Advances in Biological and Medical Physics (Vol. IV), eds. J. H. Lawrence and C. A. Tobias, Academic Press.

bodyfat, bfan

## Examples

bodyfat <- bodyfat.raw
# Johnson's (1996) corrections
cases <- c(48, 76, 96) # bodyfat != 495/density - 450
bodyfat$density[cases] <- 495 / (bodyfat$bodyfat[cases] + 450)
bodyfat$height[42] <- 69.5 # Other possible data entry errors # See https://stat-ata-asu.github.io/PredictiveModelBuilding/BFdata.html bodyfat$ankle[31] <- 23.9
bodyfat$ankle[86] <- 23.7 bodyfat$forearm[159] <- 24.9
# Outlier and influential observation
outliers <- c(182, 39)
bodyfat[outliers, ]
#>     density bodyfat age weight height neck chest abdomen   hip thigh knee ankle
#> 182  1.1089     0.0  40 118.50  68.00 33.8  79.3    69.4  85.0  47.2 33.5  20.2
#> 39   1.0202    35.2  46 363.15  72.25 51.2 136.2   148.1 147.7  87.3 49.1  29.6
#>     biceps forearm wrist
#> 182   27.7    24.6  16.5
#> 39    45.0    29.0  21.4
bodyfat <- bodyfat[-outliers, ]

# Body mass index (kg/m2)
bodyfat$bmi <- with(bodyfat, weight/(height*0.0254)^2) # Alternate body mass index bodyfat$bmi2 <- with(bodyfat, (weight*0.45359237)^1.2/(height*0.0254)^3.3)
# See e.g. https://en.wikipedia.org/wiki/Body_fat_percentage#From_BMI
# \text{(Adult) body fat percentage} = (1.39 \times \text{BMI})
#               + (0.16 \times \text{age}) - (10.34 \times \text{gender}) - 9