Traumatic Brain Injury Analysis

Report 1: Data Management and Exploratory Analysis

Chong Kim

1/20/2018

Objective

This project is based on a pilot study that was conducted with 32 patients that had acute Traumatic Brain Injury (TBI) at the University of Pennsylvania and Unviersity of Alabama Birmingham Hospital. The goal of the project was to determine the change in plasma von Willebrand (VWF) antigen levels.

In the context of this project, the goal is to determine if there are any other variables other than the VWF, such as the ADAMTS13 or other non-molecular clinical factors (e.g. hospital length of stay) that can be predictive of certain outcomes such as mortality, modified Rankin scale, or neurosurgery.

Given that we have a very small group of population, it is highly likely that we won’t be able to find statistically significant difference between groups based on parametric assumptions. Despite the small sample, we will still see if there are differences in the clinical and demographic variables among the different outcome groups and see if there are insights that can be generated from this data.

Data Management

Loading Data

Based on the .xlsx extension, we use the library(readxl) package and load the data to check basic statistics using numerical and visual means.

### Load data
### here our syntax will be such that we will load libraries as-needed
library(readxl)
df <- read_xlsx("~/Documents/Projects/Huy/tbi_analysis/data/kumar_tbi_final111717.xlsx", sheet = 3)

There are many variables… in fact there are 79 variables. Initial variable creation/merging/manipulation have been done previously and for further information please consult Dr. Monisha Kumar. The first few columns (i.e. vwfag_D[0-5]) are clinical parameters taken by individuals at 5 different times. The variables with _avg at the end indicate the averge of the values (taken based on non-missing values).

Creating Variables and Changing Type

Here we will dichotomize the dc_mrs variable such that >=3 will be set as 1 and < 3 will be set as 0. Also some variables that are binary will be changed to character or factor variables for easy analysis

library(dplyr) # for piping
df$dc_mrs_bin <- ifelse(df$dc_mrs >=3, 1, 0) # dichotomize using ifelse()
df <- df %>%
  mutate_at(c("sex_binary","race_binary","surgery","dc_mrs_bin","mortality_atdischarge"),as.factor)

Numeric Check

Below we (selectively) display numeric summary of 5 variables within the sample we have.

  nbr.val nbr.null nbr.na min max range sum median mean SE.mean CI.mean.0.95 var std.dev coef.var
tbitbsubjectid 32 0 0 28 78 50 1646 46.5 51.44 2.949 6.014 278.3 16.68 0.3243
vwfag_D0 10 0 22 0.9598 6.633 5.673 28.29 2.452 2.829 0.5085 1.15 2.586 1.608 0.5685
vwfag_D1 16 0 16 1.001 5.928 4.927 49.29 2.947 3.081 0.3557 0.7581 2.024 1.423 0.4618
vwfag_D2 20 0 12 1.817 15.27 13.45 87.34 3.598 4.367 0.6544 1.37 8.565 2.927 0.6702
vwfag_D3 21 1 11 0 17.45 17.45 105.3 4.233 5.012 0.9429 1.967 18.67 4.321 0.8621

Visual Check

Before going onto checking every single variable, we will focus mostly on the average values, demographic variables, and clinical outcome indicators. We will describe them more as we go.

Figure 1

We can see that a few variables seem to be close to being normally distributed and a few that have high kurtosis and skewness (we will quantify them below).

Skewness Values

Table continues below
VWFAg avg VWFAc avg Ratio avg A13 average A13VWFratio average
2.562 0.9384 -0.0002568 0.1052 2
HNP average CCI score dc_mrs hosp_los icu_los pt inr ptt
1.401 1.582 0.2244 1.572 0.4971 0.244 0.255 0.6453

Kurtosis Values

VWFAg avg VWFAc avg Ratio avg A13 average A13VWFratio average HNP average CCI score dc_mrs hosp_los icu_los pt inr ptt
6.488 1.136 -1.217 -0.7049 4.195 1.897 2.212 -1.417 2.22 -0.8989 -0.8114 -0.9175 0.9709

There definitely seems to be few variables that are highly skewed but most seem to be normally distributed. Similar abberation from a normal distribution can be indicated by looking at the kurtosis values.

The next report will go through some options regarding how to set up a prediction model that may allow for predicting the several outcomes of interest. The focus is on mortality at discharge and dc_mrs which indicates the modified rank score that is dichotomized. We will explort dimension reduction strategies, variable selection, and resampling strategies.

Table 1

Table 1 is based on stratifying the analytical cohort by mortality at discharge. In the below table 1 indicates those that have died and 0 indicates those subjects that haven’t.

  level 0 1 p test
n 23 3
VWFAg_avg (mean (sd)) 4.13 (3.17) 8.52 (6.45) 0.056
VWFAc_avg (mean (sd)) 2.90 (1.13) 4.70 (1.83) 0.023
Ratio_avg (mean (sd)) 0.81 (0.24) 0.63 (0.25) 0.239
A13_avg (mean (sd)) 0.77 (0.32) 0.75 (0.12) 0.922
A13VWFratio (mean (sd)) 0.25 (0.17) 0.11 (0.05) 0.175
HNP_avg (mean (sd)) 26.69 (23.49) 20.83 (12.47) 0.679
mechanismofinjury (mean (sd)) 2.57 (1.16) 2.33 (1.15) 0.748
CCI (mean (sd)) 1.22 (1.83) 1.33 (1.53) 0.918
age (mean (sd)) 44.04 (22.94) 39.00 (15.39) 0.717
sex_binary (%) 0 3 (13.0) 1 ( 33.3) 0.408 exact
1 20 (87.0) 2 ( 66.7)
race_binary (%) 0 12 (52.2) 1 ( 33.3) 1.000 exact
1 11 (47.8) 2 ( 66.7)
surgery (%) 0 13 (56.5) 1 ( 33.3) 0.887
1 10 (43.5) 2 ( 66.7)
hosp_los (mean (sd)) 16.70 (13.47) 27.67 (27.21) 0.248
gcs_adm (mean (sd)) 8.22 (5.47) 4.33 (2.31) 0.242
pt (mean (sd)) 14.24 (1.41) 13.60 (1.47) 0.469
inr (mean (sd)) 1.19 (0.14) 1.13 (0.12) 0.522
ptt (mean (sd)) 30.70 (5.90) 34.23 (3.61) 0.326
dc_mrs_bin (%) 0 13 (56.5) 0 ( 0.0) 0.220 exact
1 10 (43.5) 3 (100.0)

We can see that there really isn’t any statistically significant difference in the two groups based on t-test (unequal variance assumption) and Fisher’s exact test (\(\chi^2\) equivalent for small sample). Let’s look at difference in modified rank score groups. In the below Table 2, 1 indicates those with high modified rank score (i.e. MRS >= 3) and 0 indicates those with low modified rank score.

Table 2

  level 0 1 p test
n 13 13
VWFAg_avg (mean (sd)) 3.15 (1.42) 6.13 (4.77) 0.041
VWFAc_avg (mean (sd)) 2.45 (0.95) 3.76 (1.34) 0.009
Ratio_avg (mean (sd)) 0.83 (0.24) 0.74 (0.25) 0.341
A13_avg (mean (sd)) 0.79 (0.36) 0.75 (0.25) 0.758
A13VWFratio (mean (sd)) 0.30 (0.20) 0.16 (0.07) 0.024
HNP_avg (mean (sd)) 26.99 (29.29) 25.04 (13.59) 0.829
mechanismofinjury (mean (sd)) 2.69 (0.95) 2.38 (1.33) 0.502
CCI (mean (sd)) 1.46 (1.81) 1.00 (1.78) 0.518
age (mean (sd)) 46.54 (23.98) 40.38 (20.35) 0.487
sex_binary (%) 0 2 ( 15.4) 2 (15.4) 1.000 exact
1 11 ( 84.6) 11 (84.6)
race_binary (%) 0 8 ( 61.5) 5 (38.5) 0.434 exact
1 5 ( 38.5) 8 (61.5)
mortality_atdischarge (%) 0 13 (100.0) 10 (76.9) 0.220 exact
1 0 ( 0.0) 3 (23.1)
surgery (%) 0 12 ( 92.3) 2 (15.4) <0.001
1 1 ( 7.7) 11 (84.6)
hosp_los (mean (sd)) 9.00 (6.87) 26.92 (16.17) 0.001
gcs_adm (mean (sd)) 11.23 (5.12) 4.31 (2.63) <0.001
pt (mean (sd)) 14.23 (1.44) 14.10 (1.41) 0.817
inr (mean (sd)) 1.18 (0.15) 1.18 (0.12) 0.886
ptt (mean (sd)) 30.41 (6.54) 31.81 (4.99) 0.545

With the outcome as modified rank score, we definitely see some difference between those with higher score vs. those with lower (\(0-2 vs. 3-6\)). Perhaps it would be prudent to look at results for this outcome carefully. Let’s also get some univariate results for difference inthose who get neurosurgery vs. those who do not (In Table 3, surgery = 1 and no surgery = 0).

Table 3

  level 0 1 p test
n 13 13
VWFAg_avg (mean (sd)) 3.15 (1.42) 6.13 (4.77) 0.041
VWFAc_avg (mean (sd)) 2.45 (0.95) 3.76 (1.34) 0.009
Ratio_avg (mean (sd)) 0.83 (0.24) 0.74 (0.25) 0.341
A13_avg (mean (sd)) 0.79 (0.36) 0.75 (0.25) 0.758
A13VWFratio (mean (sd)) 0.30 (0.20) 0.16 (0.07) 0.024
HNP_avg (mean (sd)) 26.99 (29.29) 25.04 (13.59) 0.829
mechanismofinjury (mean (sd)) 2.69 (0.95) 2.38 (1.33) 0.502
CCI (mean (sd)) 1.46 (1.81) 1.00 (1.78) 0.518
age (mean (sd)) 46.54 (23.98) 40.38 (20.35) 0.487
sex_binary (%) 0 2 ( 15.4) 2 (15.4) 1.000 exact
1 11 ( 84.6) 11 (84.6)
race_binary (%) 0 8 ( 61.5) 5 (38.5) 0.434 exact
1 5 ( 38.5) 8 (61.5)
mortality_atdischarge (%) 0 13 (100.0) 10 (76.9) 0.220 exact
1 0 ( 0.0) 3 (23.1)
surgery (%) 0 12 ( 92.3) 2 (15.4) <0.001
1 1 ( 7.7) 11 (84.6)
hosp_los (mean (sd)) 9.00 (6.87) 26.92 (16.17) 0.001
gcs_adm (mean (sd)) 11.23 (5.12) 4.31 (2.63) <0.001
pt (mean (sd)) 14.23 (1.44) 14.10 (1.41) 0.817
inr (mean (sd)) 1.18 (0.15) 1.18 (0.12) 0.886
ptt (mean (sd)) 30.41 (6.54) 31.81 (4.99) 0.545

In addition to the VWF and A13 clinical variables, the Hospital and ICU length of stay seem to be univariately associated with the surgery outcome. We can further investigate the classification that can be done in terms of the 3 outcomes.

For the next step (i.e. Statistical Analysis) click here.