1. Set-up

Load necessary packages.

Throughout this tutorial, familiarity with the LMM and basic knowledge in R are assumed (see Pinheiro & Bates, 2000 and Baayen, Davidson, & Bates, 2008 for introduction of the LMM with R).

In what follows, we make use of treatment coding for factors, as this makes the interpretation of interactions of factors and numerical predictors in GAMMs much more straightforwardly interpretable.

2. Cross-language orthographic similarity effect

We first demonstrate how the GAMM can be applied in bilingual processing research, reanalyzing lexical decision data from Dijkstra et al. (2010). The study tested 21 Dutch-English bilinguals reading 194 English words and 194 nonwords in a lexical decision experiment. We reanalyzed data for 189 target words for which subtitle Frequency data were available. The authors reported a substantial facilitation for identical cognates on top of a facilitatory effect of standardized orthographic similarity (hereafer OS).

We start with the LMM that many psycholinguists are familiar with. In the following LMM model, we replicate Dijkstra et al’s (2010) result with a rated orthographic similarity between target English words and their Dutch translation equivalent (OS) and a factor (Ident) with the two levels of Identical and NonIdentical. We also include subtitle frequency as a covariate (Frequency).

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## invRT ~ OS + Ident + Frequency + (1 | Participant) + (0 + Trial |  
##     Participant) + (1 | Word)
##    Data: dijkstra
## 
## REML criterion at convergence: 2140.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8078 -0.6472 -0.0731  0.5623  4.2783 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  Word          (Intercept) 0.010017 0.10009 
##  Participant   Trial       0.001581 0.03976 
##  Participant.1 (Intercept) 0.064322 0.25362 
##  Residual                  0.097379 0.31206 
## Number of obs: 3535, groups:  Word, 178; Participant, 21
## 
## Fixed effects:
##                  Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)     -1.858546   0.056281  20.985138 -33.023  < 2e-16 ***
## OS              -0.023302   0.011993 169.302742  -1.943  0.05369 .  
## IdentIdentical  -0.119715   0.037983 166.176468  -3.152  0.00193 ** 
## Frequency       -0.076796   0.009192 168.956303  -8.355 2.29e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Identical cognates revealed substantial facilitation on top of a facilitatory effect of OS (indicated by the red circle).

## effect size (range) for  OS is  0.06383857

Within the context of the LMM, the effect of OS can be modeled as nonlinear by regressing the response on powers of OS (e.g., OS squared, cubed etc.). Here, we fit a model with a second-degree polynomial.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: invRT ~ poly(OS, 2, raw = T) + Frequency + (1 | Participant) +  
##     (0 + Trial | Participant) + (1 | Word)
##    Data: dijkstra
## 
## REML criterion at convergence: 2147.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8062 -0.6419 -0.0721  0.5649  4.2484 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  Word          (Intercept) 0.010424 0.10210 
##  Participant   Trial       0.001591 0.03988 
##  Participant.1 (Intercept) 0.064314 0.25360 
##  Residual                  0.097373 0.31205 
## Number of obs: 3535, groups:  Word, 178; Participant, 21
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)
## (Intercept)            -1.841059   0.057809  23.346336 -31.847  < 2e-16
## poly(OS, 2, raw = T)1  -0.030847   0.011880 168.886107  -2.597   0.0102
## poly(OS, 2, raw = T)2  -0.030733   0.013471 166.399554  -2.281   0.0238
## Frequency              -0.078955   0.009305 169.310729  -8.485 1.04e-14
##                          
## (Intercept)           ***
## poly(OS, 2, raw = T)1 *  
## poly(OS, 2, raw = T)2 *  
## Frequency             ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## effect size (range) for  poly(OS, 2, raw = T) is  0.1630846

The polynomial regression line captured the overall negative accelerating trend. However, the model lacks precision when pitted against the model with a strictly linear effect of OS in combination with the factor Ident; the substantial facilitation for Ident is no longer visible. When the factor Ident is included in the model, the polynomial regression line is no longer supported.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## invRT ~ poly(OS, 2, raw = T) + Frequency + Ident + (1 | Participant) +  
##     (0 + Trial | Participant) + (1 | Word)
##    Data: dijkstra
## 
## REML criterion at convergence: 2146.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8111 -0.6460 -0.0726  0.5606  4.2785 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  Word          (Intercept) 0.010078 0.10039 
##  Participant   Trial       0.001582 0.03977 
##  Participant.1 (Intercept) 0.064323 0.25362 
##  Residual                  0.097379 0.31206 
## Number of obs: 3535, groups:  Word, 178; Participant, 21
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)
## (Intercept)            -1.868521   0.059078  25.424613 -31.628  < 2e-16
## poly(OS, 2, raw = T)1  -0.024195   0.012123 168.028962  -1.996   0.0476
## poly(OS, 2, raw = T)2   0.013339   0.023989 167.906048   0.556   0.5789
## Frequency              -0.076142   0.009286 168.045096  -8.200 5.93e-14
## IdentIdentical         -0.151425   0.068558 167.996356  -2.209   0.0285
##                          
## (Intercept)           ***
## poly(OS, 2, raw = T)1 *  
## poly(OS, 2, raw = T)2    
## Frequency             ***
## IdentIdentical        *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## refitting model(s) with ML (instead of REML)
## Data: dijkstra
## Models:
## dijkstra.lmer.poly2: invRT ~ poly(OS, 2, raw = T) + Frequency + (1 | Participant) + 
## dijkstra.lmer.poly2:     (0 + Trial | Participant) + (1 | Word)
## dijkstra.lmer.poly3: invRT ~ poly(OS, 2, raw = T) + Frequency + Ident + (1 | Participant) + 
## dijkstra.lmer.poly3:     (0 + Trial | Participant) + (1 | Word)
##                     Df    AIC    BIC  logLik deviance  Chisq Chi Df
## dijkstra.lmer.poly2  8 2137.8 2187.1 -1060.9   2121.8              
## dijkstra.lmer.poly3  9 2134.8 2190.4 -1058.4   2116.8 4.9242      1
##                     Pr(>Chisq)  
## dijkstra.lmer.poly2             
## dijkstra.lmer.poly3    0.02648 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The generalized additive mixed model (GAMM) offers a toolkit for building a more precise statistical model for the present dataset. GAMMs can be used for both exploratory data analysis and for confirmatory data analysis. In the case of confirmatory data analysis, the researcher will have in mind a specific hypothesis about the functional form of a main effect or interaction. This hypothesis is then formalized in the form of a GAMM model, and once the data have been collected, a fit of this specific model to the data will straightforwardly reveal, first, whether a non-linear effect is indeed present, and second, whether the effect has the predicted functional form. Apart from potential further refinements to the model as dictated by model criticism, no further model fitting will necessary nor allowed. When the researcher has no clear theoretically motivated hypotheses about the shape of nonlinear effects, the analysis is necessarily exploratory in nature. In this case, incremental model building will typically afford the researcher more insight into the structure of a dataset than a model with the most complex structure that the data can tolerate. When adding in, step by step, more nonlinearities to the model, it is advisable to set alpha levels sufficiently low, for instance by using a Bonferroni correction for the number of models fitted, to safeguard against overvaluing effects.

In what follows, we make use of the mgcv package (Wood, 2017). In the following model formula, smooth terms for a numerical predictor are included in the model by the s() function, using the thin plate regression spline. s(Word, bs = "re") indicates random intercepts for Word. s(Trial, Participant, bs="fs", m=1) indicates factor smooths (the non-linear equivalent of random intercepts and random slopes), with the intercepts also estimated for Participant.

The mathematics underlying the penalization of basis functions and the mathematics underlying random effects are very similar in GAMMs, hence it makes sense mathematically that the same s() directive is used. There is an extensive technical literature on how to set up the basis functions for splines. In fact, there are many different kinds of splines, and how exactly the default splines are set up in MGCV requires at least one full page of mathematics. The reader is referred to Wood (2017) for details. Visual examples of how smooths are constructed out of basis functions are given in Baayen et al. (2017).

Factor smooths implement shrinkage for wiggly curves, just as the LMM implements shrinkage for random intercepts and random slopes. Shrinkage is a statistical technique that ensures that model parameters for individual subjects (or items) are more conservative than would be the case if models were fit to subjects (or items) individually. In this way, the researcher is protected against overfitting, and predictions will be more precise for future observations on the same subjects (or items). The LMM implements shrinkage for random intercepts and random slopes, which protects the model against overfitting (see for detailed discussion Baayen, 2008). GAMMs likewise implement shrinkage for random-effect factors. Within the context of the GAMM, it is possible to set up the nonlinear equivalent of random intercepts and random slopes by means of special splines known as factor smooths. For factor smooths, shrinkage ensures that if the response does not covary with a given predictor, a factor smooth for that predictor will reduce to random intercepts.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ s(OS) + s(Frequency) + s(Trial, Participant, bs = "fs", 
##     m = 1) + s(Word, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.87789    0.05761   -32.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf  Ref.df      F  p-value    
## s(OS)                  2.745   2.940 11.629 1.13e-07 ***
## s(Frequency)           3.166   3.442 23.237 5.81e-16 ***
## s(Trial,Participant)  65.673 189.000 12.759  < 2e-16 ***
## s(Word)              113.601 175.000  2.186 6.11e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.454   Deviance explained = 48.2%
## fREML = 1063.8  Scale est. = 0.095747  n = 3535

Unlike the LMM, a GAMM summary has two parts: a parametric part for linear terms and a nonparametric part for smooth terms. The smooth terms comprise both splines (as for Frequency), factor smooths (as for the trial by participant interaction), and random intercepts (as for Word). An F-test (detailed in Wood, 2013) is reported that clarifies whether a smooth term provides a noteworthy contribution to the model fit. Comparison of models with and without a given smooth term typically leads to the same conclusions as this F-test.

We visualize the model output. The top left panel visualizes the effect of OS. With the thin plate regression spline for OS, the GAMM is flexible enough to capture the greater facilitation for identical cognates (the high edf value indicates greater wiggliness). The evaluation of GAMMs relies more on visualization than LMMs do. Where the confidence interval (which is empirical Bayes rather than frequentist) does not include zero, indicated by the red line, the effect is significant. The effect shown is the partial effect of the predictor, i.e., the contribution of the predictor to the fitted values, independently of the contributions of the other predictors in the model. The argument “residuals=F” suppresses the visualization of raw data. The top right panel visualizes the effect of word frequency, which is strong in the middle range of log frequency, and tapers off at both tails of the distribution. The lower left panel presents the by-participant random curves for Trial, showing considerable variability between participants, with some showing stable behavior, with other showing nearly linear trends up or down (possibly indicators of fatigue or practice effects), and some showing undulating patterns suggestive of fluctuations in attention. The lower right panel presents a quantile-quantile plot for the model residuals, which roughly follow a normal distribution, as required. For a technical explanation of the summaries of the model, see Wood (2012).

We then include by-participant random slopes for Frequency and OS to see whether the model improves. This is equivalent to the specification in lme4: (0+Frequency|Participant). What happens in mgcv is that a ridge penalty is put on the by-subject random slopes for frequency. For straightforward datasets with linear predictors, this leads to virtually the same estimates as given by lme4.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ s(OS) + s(Frequency) + s(Participant, Trial, bs = "fs", 
##     m = 1) + s(Participant, Frequency, bs = "re") + s(Participant, 
##     OS, bs = "re") + s(Word, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.87791    0.05772  -32.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                              edf  Ref.df      F  p-value    
## s(OS)                      2.724   2.913  9.580 2.23e-06 ***
## s(Frequency)               3.169   3.440 16.223 3.89e-11 ***
## s(Participant,Trial)      65.474 189.000 12.756  < 2e-16 ***
## s(Frequency,Participant)  12.143  20.000  1.550 0.000201 ***
## s(OS,Participant)          8.239  21.000  0.675 0.028204 *  
## s(Word)                  114.701 178.000  2.097 0.190255    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.461   Deviance explained = 49.2%
## fREML = 1056.2  Scale est. = 0.094453  n = 3535
## dijkstra.gam: invRT ~ s(OS) + s(Frequency) + s(Trial, Participant, bs = "fs", 
##     m = 1) + s(Word, bs = "re")
## 
## dijkstra.gam2: invRT ~ s(OS) + s(Frequency) + s(Participant, Trial, bs = "fs", 
##     m = 1) + s(Participant, Frequency, bs = "re") + s(Participant, 
##     OS, bs = "re") + s(Word, bs = "re")
## 
## Chi-square test of fREML scores
## -----
##           Model    Score Edf Difference    Df   p.value Sig.
## 1  dijkstra.gam 1063.783   8                                
## 2 dijkstra.gam2 1056.208  10      7.576 2.000 5.128e-04  ***
## 
## AIC difference: 24.20, model dijkstra.gam2 has lower AIC.

With the by-participant random slopes for Frequency and OS, the model improved (fREML score decreases from 1063 to 1056). We also test whether the factor Ident will improve the model.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ Ident + s(OS) + s(Frequency) + s(Participant, Trial, 
##     bs = "fs", m = 1) + s(Participant, Frequency, bs = "re") + 
##     s(Participant, OS, bs = "re") + s(Word, bs = "re")
## 
## Parametric coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.86877    0.05758 -32.458  < 2e-16 ***
## IdentIdentical -0.11287    0.03807  -2.965  0.00305 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                              edf Ref.df      F  p-value    
## s(OS)                      1.000   1.00  4.405 0.035891 *  
## s(Frequency)               3.137   3.41 15.641 1.08e-10 ***
## s(Participant,Trial)      65.379 189.00 12.737  < 2e-16 ***
## s(Frequency,Participant)  12.150  20.00  1.553 0.000193 ***
## s(OS,Participant)          8.233  21.00  0.674 0.025210 *  
## s(Word)                  114.807 178.00  2.446 0.057992 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.461   Deviance explained = 49.2%
## fREML = 1055.6  Scale est. = 0.09446   n = 3535

It is noteworthy that once the factor Ident is taken into account, the effect of OS becomes linear (i.e., the edf is 1), with a substantially increased p-value. Because we are assessing the additive contribution of a fixed-effect, we set the method to “ML.” The inclusion of Ident reduces the ML score.

## dijkstra.gam2ml: invRT ~ s(OS) + s(Frequency) + s(Participant, Trial, bs = "fs", 
##     m = 1) + s(Participant, Frequency, bs = "re") + s(Participant, 
##     OS, bs = "re") + s(Word, bs = "re")
## 
## dijkstra.gam3ml: invRT ~ Ident + s(OS) + s(Frequency) + s(Participant, Trial, 
##     bs = "fs", m = 1) + s(Participant, Frequency, bs = "re") + 
##     s(Participant, OS, bs = "re") + s(Word, bs = "re")
## 
## Chi-square test of ML scores
## -----
##             Model    Score Edf Difference    Df p.value Sig.
## 1 dijkstra.gam2ml 1048.918  10                              
## 2 dijkstra.gam3ml 1044.504  11      4.414 1.000   0.003  ** 
## 
## AIC difference: 2.44, model dijkstra.gam3ml has lower AIC.
## Warning in compareML(dijkstra.gam2ml, dijkstra.gam3ml): Only small difference in ML...

In our experience with GAMMs, the amount of wiggliness for a given smooth term does not change much depending on the presence or absence of other predictors. This is shown by the following examples.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ s(OS) + s(Frequency) + s(Trial, Participant, bs = "fs", 
##     m = 1) + s(Word, bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.87789    0.05761   -32.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf  Ref.df      F  p-value    
## s(OS)                  2.745   2.940 11.629 1.13e-07 ***
## s(Frequency)           3.166   3.442 23.237 5.81e-16 ***
## s(Trial,Participant)  65.673 189.000 12.759  < 2e-16 ***
## s(Word)              113.601 175.000  2.186 6.11e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.454   Deviance explained = 48.2%
## fREML = 1063.8  Scale est. = 0.095747  n = 3535
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ s(OS) + s(Trial, Participant, bs = "fs", m = 1) + s(Word, 
##     bs = "re")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.86963    0.05828  -32.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf  Ref.df      F  p-value    
## s(OS)                  2.214   2.323  7.719 0.000143 ***
## s(Trial,Participant)  70.953 189.000 13.187  < 2e-16 ***
## s(Word)              142.523 187.000  5.149  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.455   Deviance explained = 48.6%
## fREML = 1172.5  Scale est. = 0.096576  n = 3747
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## invRT ~ s(OS, k = 5) + s(Trial, Participant, bs = "fs", m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.87485    0.05691  -32.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf  Ref.df     F  p-value    
## s(OS)                 3.277   3.696 21.23 6.45e-16 ***
## s(Trial,Participant) 66.699 188.000 11.15  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.367   Deviance explained = 37.9%
## fREML = 1310.7  Scale est. = 0.11212   n = 3747

The number of basis functions used by mgcv by default is not theoretically motivated, and users are encouraged to set k to higher values than the default to make sure the model is not oversmoothing.

Regarding the by-participant random curves for Trial, it is possible to include a population effect for trial. The assumption would then be that all participants would show the same wiggly development through the experiment, with individual variation riding on top. As shown in the example below, such a putative curve is not significant.

## Warning in bam(invRT ~ s(OS) + s(Frequency) + s(Trial, Participant, bs =
## "fs", : discretization only available with fREML
## Warning in bam(invRT ~ s(OS) + s(Trial) + s(Frequency) + s(Trial,
## Participant, : discretization only available with fREML
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.
## dijkstra.gam.ml: invRT ~ s(OS) + s(Frequency) + s(Trial, Participant, bs = "fs", 
##     m = 1) + s(Word, bs = "re")
## 
## dijkstra.gam6ml: invRT ~ s(OS) + s(Trial) + s(Frequency) + s(Trial, Participant, 
##     bs = "fs", m = 1) + s(Word, bs = "re")
## 
## Chi-square test of ML scores
## -----
##             Model    Score Edf Difference    Df p.value Sig.
## 1 dijkstra.gam.ml 1056.260   8                              
## 2 dijkstra.gam6ml 1056.259  10      0.000 2.000   1.000     
## 
## AIC difference: -0.45, model dijkstra.gam.ml has lower AIC.
## Warning in compareML(dijkstra.gam.ml, dijkstra.gam6ml): Only small difference in ML...

3. Cross-language phonological similarity effect

To test whether a cross-language phonological similarity effect similarly shows nonlinearity, we reanalyze Miwa et al.’s (2014) lexical decision data. The authors tested 19 Japanese-English bilinguals and 19 English monolinguals reading English words. Here, we reanalyzed data for 228 target words for which subtitle Frequency data were available.

We first fit a LMM with a second-degree polynomial, testing nonlinearity for phonological similarity (PS).

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## invRT ~ poly(PS, 2, raw = T) + Frequency + Trial + (1 | Participant) +  
##     (0 + Trial | Participant) + (0 + Frequency | Participant) +  
##     (1 | Word)
##    Data: droplevels(miwa[miwa$FirstLanguage == "Japanese", ])
## 
## REML criterion at convergence: 1113.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9825 -0.6692 -0.0828  0.6023  3.5451 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev.
##  Word          (Intercept) 0.0114310 0.10692 
##  Participant   Frequency   0.0004007 0.02002 
##  Participant.1 Trial       0.0017353 0.04166 
##  Participant.2 (Intercept) 0.0292469 0.17102 
##  Residual                  0.0683019 0.26135 
## Number of obs: 4104, groups:  Word, 228; Participant, 19
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)
## (Intercept)           -1.468e+00  4.046e-02  1.992e+01 -36.280  < 2e-16
## poly(PS, 2, raw = T)1 -2.165e-02  9.566e-03  2.217e+02  -2.263   0.0246
## poly(PS, 2, raw = T)2  4.128e-04  5.553e-03  2.224e+02   0.074   0.9408
## Frequency             -8.148e-02  9.409e-03  7.840e+01  -8.660 4.76e-13
## Trial                 -1.140e-01  1.044e-02  1.804e+01 -10.914 2.23e-09
##                          
## (Intercept)           ***
## poly(PS, 2, raw = T)1 *  
## poly(PS, 2, raw = T)2    
## Frequency             ***
## Trial                 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The non-linearity is not supported for PS. When we fit a LMM with PS without assuming nonlinearity, the slope is estimated to be -0.022.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## invRT ~ PS + Frequency + Trial + (1 | Participant) + (0 + Trial |  
##     Participant) + (0 + Frequency | Participant) + (1 | Word)
##    Data: droplevels(miwa[miwa$FirstLanguage == "Japanese", ])
## 
## REML criterion at convergence: 1105.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9830 -0.6684 -0.0829  0.6020  3.5451 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev.
##  Word          (Intercept) 0.0113630 0.10660 
##  Participant   Frequency   0.0004007 0.02002 
##  Participant.1 Trial       0.0017356 0.04166 
##  Participant.2 (Intercept) 0.0292475 0.17102 
##  Residual                  0.0683020 0.26135 
## Number of obs: 4104, groups:  Word, 228; Participant, 19
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  -1.467519   0.040075  19.171944 -36.620  < 2e-16 ***
## PS           -0.022017   0.008172 225.210816  -2.694  0.00759 ** 
## Frequency    -0.081508   0.009387  77.926653  -8.683 4.48e-13 ***
## Trial        -0.113959   0.010443  18.042736 -10.913 2.23e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## effect size (range) for  PS is  0.1272519