Choosing Your Statistical Weapon: Real-World Examples for Doctoral Analysis

Selecting the right statistical tool can feel overwhelming. Let’s ground the process with concrete examples using common secondary datasets which are commonly used in PhD research. The crux remains like this: Your question dictates your tool, and your data’s nature validates that choice.

Example 1: Financial Data of Listed Companies (Last 20 Years)

Research Question: “Do firms with higher ESG (Environmental, Social, Governance) scores exhibit lower volatility in their stock returns during market downturns compared to firms with lower ESG scores?”

  • Step 1 – Map the Question: You are comparing groups (High-ESG vs. Low-ESG firms) on an outcome (stock return volatility, likely measured by standard deviation). You’re also considering a condition (during market downturns). This hints at a moderating or conditioning variable.
  • Step 2 – Audit the Data: Your dependent variable (volatility) is continuous. Your key independent variable (ESG Group) is categorical (you might create a median split or use terciles). You have a massive panel dataset (multiple companies over 20 years—repeated measures).
  • Step 3 – Match & Check Assumptions:
    • A simple independent t-test comparing the average volatility of the two groups across all years would be wrong—it ignores the time series and repeated company data, violating the independence assumption.
    • The correct tool is likely a Panel Data Regression (Fixed/Random Effects Model). This controls for unobserved, time-invariant company characteristics (e.g., inherent industry risk). You could model volatility as a function of ESG group, market condition (downturn=1, normal=0), and their interaction term. A significant interaction would answer your question.
    • Assumptions to check: Serial correlation (Durbin-Watson test), heteroskedasticity, and stationarity of your volatility measure.

Example 2: Gold or Oil Prices (20-Year Daily/Weekly Time Series)

Research Question: “What is the long-term trend and seasonality component in monthly gold prices, and can we reliably forecast short-term future prices?”

  • Step 1 – Map the Question: This is a classic time series analysis question involving decomposition (trend, seasonality) and prediction.
  • Step 2 – Audit the Data: You have a univariate time series—a single variable (price) measured at regular intervals over time. The data points are not independent; today’s price is heavily influenced by yesterday’s.
  • Step 3 – Match & Check Assumptions:
    • Descriptive/Exploratory: Begin with visualizations (line plot) and time series decomposition (using additive or multiplicative models) to isolate trend, seasonal, and irregular components.
    • Forecasting: Standard regression fails due to autocorrelation. The go-to tools are ARIMA (AutoRegressive Integrated Moving Average) or Exponential Smoothing (ETS) models.
    • Crucial Pre-Step: You must first test for stationarity using the Augmented Dickey-Fuller (ADF) test. Non-stationary data (with a strong trend) must be differenced. For gold, you might also model returns (percentage change) instead of raw prices to achieve stationarity. You’d then fit ARIMA models, diagnose residuals, and compare forecast accuracy (e.g., using Mean Absolute Percentage Error).

Example 3: USD/INR Exchange Rate Fluctuations

Research Question 1: “Is there a causal relationship between changes in the US Federal Reserve interest rate and the volatility of the USD/INR exchange rate?”

  • Step 1 – Map the Question: You are examining a dynamic relationship between two time series and specifically asking about causality and volatility.
  • Step 2 – Audit the Data: Two continuous time series: Fed rate (likely monthly) and a measure of INR volatility (e.g., the standard deviation of daily returns within that month).
  • Step 3 – Match & Check Assumptions:
    • Granger Causality Test: Specifically designed to test if past values of one time series (Fed rate) help predict another (INR volatility). A significant result suggests “Granger-causality,” not true causality, but predictive power in a time-series context.
    • Pre-requisite: Both series must be stationary. You would use the ADF test and difference the series if needed.
    • Advanced Alternative: To model volatility directly, you could use a GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model. This is the standard tool in econometrics for modeling financial volatility clustering (periods of high volatility followed by high volatility).

Research Question 2: “Does the USD/INR return series follow a normal distribution?”

  • This is a foundational question about the distribution of your data.
  • Tool: Here, you are not modelling relationships. You are testing a property. Use descriptive statistics (skewness, kurtosis), a Q-Q plot, and a formal test like the Kolmogorov-Smirnov or Jarque-Bera test. The finding (it almost certainly will not be normal) will then inform your choice of other tools, pushing you towards non-parametric methods or distributions used in financial mathematics (like the Student’s t-distribution).

 

Leave a Reply

Your email address will not be published. Required fields are marked *