(Dis)Trust and the Rise of the Independents?

I wanted to investigate whether there was a relationship between trust and identifying as an Independent over time, using the 1972-2012 General Social Survey data. My dependent variable, independent, comes from the GSS variable partyid, which corresponds to the question “Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what?” Respondents can choose from 0 to 7, with number 3 being “Independent.” My variable independent is a binary variable, where 1 indicates that the respondent said “3” and 0 indicates the respondent said another option. My primary independent variable is trust, which comes from the GSS question “Generally speaking, would you say that most people can be trusted or that you can’t be too careful in life?” Respondents can choose on from 1 to 3: can trust, cannot trust, and depends. For the purposes of this project, I recoded trust into a binary variable, where 1 indicates the respondent answered “can trust” and 0 indicates the respondent answer with another option. For my control variables, I included gender and age (measured in years). For ease of interpretation, I recoded gender into a binary variable where 0 represents males and 1 represents females.

Because the GSS was not administered every year between 1972 and 2012, I first calculated the mean value for all my variables every year, and then interpolated these values into the missing years to create a time series data frame by.year.ts. So that the means of my two primary variables were not fractions, I multiplied them by 100 to create percentages, as seen in the variables ind_pct and trust_pct.

Plots of the means of my variables ind_pct, trust_pct, age, and female against time are shown below. As can be seen, the percentage of Independents moves a bit over time, but has a general upward trend, whereas the percentage of trusting individuals moves a lot but has a distinct downward trend over time. Average age also increases slightly over time. The proportion of female respondents was 50% in the first year of the survey, but for every year after that was usually somewhere between 55% and 58% of the survey. Note that the smoother portions in all these plots might be years that were interpolated.

g_ind.pngtrust_pct.pngage.pngfemale.png

Based on the plots, it seems clear to expect that time and the number of Independents has a positive relationship. I also expect the percentage of trusting people and the percentage of Independents will have an inverse relationship. My reasoning behind this hypothesis is that as people become less trusting in general, they also become less trusting of organized political party platforms, and distance themselves from organized politics by considering themselves to be Independents. I included age and gender as control variables, and included plots of these variables across time as a reference for analysis.

I ran a simple time series regression of the percentage of people who identify as Independent on the percentage of trusting people; the results are shown in Table 1. They show that every additional percent increase in trusting people is associated with a 0.501 percentage point decrease in people who identify as Independent, on average. It is highly significant (p-value < 0.0001). These results are consistent with the hypothesis I outlined previously, but might be misleading given how highly correlated trust and political independence are with time, which is an omitted variable in this model. In addition, the model probably suffers from other sources of omitted variable bias, given that my adjusted R-squared is only 0.39.

Table 1: ind_pct on trust-pct

Screen Shot 2016-06-20 at 3.47.28 PM.png

Just to be sure that I did not need to correct for heteroskedasticity in my model, I ran a Breusch-Pagan test (not shown). It indicates that I fail to reject the null hypothesis of homoskedasticity (p-value is 0.596), so there is no need to correct for heteroskedasticity.

I modified the model by adding a time trend; the regression results are shown in Table 2. They indicate that, on average, every additional percentage point more of the percent of trusting people is associated with a 0.022 percentage point decrease in the percent of Independents, net of `year`, though it is no longer statistically significant. The coefficient for `year` indicates that for every year that passes, the percent of Independents increases by 0.204 percentage points, net of trust, on average. This is moderately significant (p-value < 0.01).

Table 2: ind_pct on trust_pct and year

Screen Shot 2016-06-20 at 3.50.39 PM.png

I then investigated whether I had autocorrelation in my model, beginning with the variance inflation factor (VIF) (not shown). It indicates that trust_pct and year have a VIF of approximately 4, which is higher than desirable (above 3) but not incredibly high (above 10). There is potentially a multicollinearity problem, but it is not large enough at this point to warrant correcting measures (such as de-trending the data before regression).

To examine the autocorrelation problem further, I then examined the plot of the auto-correlation function (ACF) estimation, as shown in below. It indicates that I have an AR(1) and AR(2) effect, as well as potentially an AR(3) effect.

acf.png

The partial auto-correlation function (PACF) estimation is shown below, supporting the cause for concern at the first lag though not the second or third lag.

pacf.png

A plot of the residuals from the model is shown below. The residuals seem to be less random in the later period of the data, which is to be expected from the interpolation, but may also be the cause of the autocorrelation.

residuals.png

To confirm the findings from the plots, I also ran an augmented Durbin-Watson test for autocorrelation (not shown). It indicates that I reject the null hypothesis of no autocorrelation in the first lag due to the small p-value in favor of the alternative hypothesis of some autocorrelation in the first lag (p-value < 0.0001); however, the p-value for the second and third lags is only mildly significant (p-value < 0.05 and p-value < 0.1, respectively), which would may be sufficient for rejecting the null hypothesis. It seems that at the very least my model suffers from an AR(1) effect.

I updated the model again to include gender and age. The regression results are shown in Table 3. They indicate that, on average, every percentage point increase in the percent of trusting people is associated with a 0.013 percentage point decrease in the percent of Independents, net of time, gender, and age; on average, every passing year is associated with a 0.252 percentage point increase in the percent of Independents, net of trust, gender, and age; on average, being female is associated with a with a 14.49 percentage point decrease in the percent of Independents, relative to men, net of trust, year, and age; and on average, every additional year of age is associated with a 0.562 percentage point decrease in the percent of Independents, net of trust, time, and gender. Only year is moderately significant (p-value < 0.001); the other variables are not statistically significant. These results align with my hypotheses: a negative relationship between trust and the number of Independents (albeit not significant, which is disappointing); a positive relationship between time and the number of Independents; and a negative relationship between being female and independence. The adjusted R-squared is not greatly improved from my previous model (0.49), indicating there is still, most likely, omitted variable bias.

Table 3: ind_pct on trust_pct, year, female, and age

Screen Shot 2016-06-20 at 3.58.32 PM.png

To check for multicollinearity, I examined the VIF for this model (not shown). The VIF for female is very low and poses no cause for concern. The VIF for trust_pct, age, and year are higher than desirable (above 3) but not so high as to cause alarm (above 10), although year is close to the threshold. For now, I will proceed assuming I do not have severe multicollinearity.

To check for autocorrelation, I ran an augmented Durbin-Watson test (not shown). The results indicate that there is autocorrelation in the first lag (p-value < 0.0001); there is potentially autocorrelation in the second lag (p-value < 0.05); but there is no autocorrelation in the third lag.“`

In an effort to address the autocorrelation, I then tried a first differenced model. Since I did not expect the gender or age to change significantly over time, I only included the percent of Independents, the percent of trusting people, and a time trend. This first differenced model seems like a poor idea, since I now have a negative adjusted R-squared, and none of the variables are significant. Upon examining the ACF and PACF plots (not shown), it seems that I have eliminated any AR(1) effects, but the model may still have AR(2) effects, which would require another differencing to eliminate. This is confirmed with a new augmented Durbin Watson test. Overall, this model does not seem to be an improvement from my previous ones, so I abandoned it.

In addition to concerns about autocorrelation, I was also concerned about unit roots. To check for unit roots in my dependent variable, I ran an augmented Dicky-Fuller test (not shown). The p-value is fairly large, so I fail to reject the null hypothesis that there is a unit root for ind_pct. This is confirmed by the Phillips-Perron test. In other words, it seems that there is a unit root problem for the variable ind_pct.  I ran the same tests on my primary independent variable, trust_pct.  Both tests return a sufficiently small p-value to reject the null hypothesis (p-value < 0.01). There seems to only be a unit root problem with my dependent variable. Unfortunately there is no easy solution to unit roots, so it is something to keep in mind while continuing my analysis.

Returning to the question about lags (and the appropriate number of them), I performed an automatic ARIMA test on the residuals from my model that regressed ind_pct on trust_pct and year. The results indicate that the automatic ARIMA suggests running a (1, 0, 0) zero mean model. In other words, I should allow for one lag but no differencing, which is consistent with my findings, and no moving average.

 

The final ARIMA model as suggested above is shown in Table 4. It indicates that, on average, each percentage more of the percent of trusting people is associated with a 0.026 percentage point decrease in the percentage of Independents, net of year, though this is not significant. It also indicates that, on average, every year that passes is associated with a 0.207 percentage point increase in the percentage of people who identify as Independents, net of trust. This result is highly significant.

Table 4: ARIMA model

Screen Shot 2016-06-20 at 4.13.24 PM.png

For one final test, I did a white noise test to see if there was any remaining evidence of serial correlation (not shown). Given the high p-value, I fail to reject the null hypothesis that I have no serial correlation. The ARIMA model improved my previous models.

In summary, while there is clearly a rise of Independents between 1972 and 2012, this rise does not seem to be driven by the concurrent rise in distrust in the United States, as supported by the best fitting model for the data (an ARIMA(1, 0, 0) model).

For the code and in-depth analysis, see this project in my GitHub portfolio.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s