Sunday, February 16, 2014

Bayesian analysis of sensory profiling data III


Last week I extended my Bayesian model. So this week I wanted to test it with different data. There is one other data set with profiling data in R, french fries in the reshape package. 'This data was collected from a sensory experiment conducted at Iowa State University in 2004. The investigators were interested in the effect of using three different fryer oils had on the taste of the fries'. In this post I try t analyze the data, however the documentation which I found is so limited, it is needed to second guess the objectives of the experiment.

Data

The data includes a treatment and a time variable. The time is in weeks and crossed with the other variables, suggesting that this is systematically varied and hence in itself a factor of interest. In the end I decided to cross treatments and time to generate a product factor with 30 levels. In addition I crossed time with a repetition factor so this effect could function as a random variable. As bonus, there are some missing data, which Stan does not like and are removed before analysis.
data(french_fries,package='reshape')
head(french_fries)
vars <- names(french_fries)
fries <- reshape(french_fries,
        idvar=vars[1:4],
        varying=list(vars[5:9]),
        v.names='Score',
        direction='long',
        timevar='Descriptor',
        times=vars[5:9])
head(fries)
fries$Product <- interaction(fries$treatment,fries$time)
fries$Session <- interaction(fries$time,fries$rep)
fries$Descriptor <- factor(fries$Descriptor)
fries <- fries[complete.cases(fries),]

Model

The model is a variation on last week, because I there is no rounds information but there are sessions. In addition, I made the range for the shift variable a bit larger, because it appeared some panelists have extreme behavior. 
model1 <-
        '
    data {
        int<lower=0> npanelist;
        int<lower=0> nobs;
        int<lower=0> nsession;
        int<lower=0> ndescriptor;
        int<lower=0> nproduct;
        vector[nobs] y;
        int<lower=1,upper=npanelist> panelist[nobs];
        int<lower=1,upper=nproduct> product[nobs];
        int<lower=1,upper=ndescriptor> descriptor[nobs];
        int<lower=1,upper=nsession> session[nobs];
        real maxy;
    }
    parameters {
        matrix<lower=0,upper=maxy> [nproduct,ndescriptor] profile;
        vector<lower=-maxy/2,upper=maxy/2>[npanelist] shift[ndescriptor];
        vector<lower=-maxy/2,upper=maxy/2>[npanelist] sitting[nsession];  
        vector<lower=-1,upper=1> [npanelist] logsensitivity;
        real<lower=0,upper=maxy/5> varr;
        vector [npanelist] lpanelistvar;
        vector [ndescriptor] ldescriptorvar;
        real<lower=0,upper=maxy/5> sigmashift;
        real<lower=0,upper=maxy/5> sigmasitting;
    }
    transformed parameters {
        vector [nobs] expect;
        vector[npanelist] sensitivity;
        real mlogsens;
        real mlpanelistvar;
        real mldescriptorvar;
        real meanshift[ndescriptor];
        real meansit[nsession];
        vector [nobs] sigma;
        mlogsens <- mean(logsensitivity);
        for (i in 1:ndescriptor) {
            meanshift[i] <- mean(shift[i]);
        }
        mlpanelistvar <- mean(lpanelistvar);
        mldescriptorvar <- mean(ldescriptorvar);   
        for (i in 1:nsession) {
            meansit[i] <- mean(sitting[i]);
        }
        for (i in 1:npanelist) {
            sensitivity[i] <- pow(10,logsensitivity[i]-mlogsens);
        }
        for (i in 1:nobs) {
            expect[i] <- profile[product[i],descriptor[i]]
                *sensitivity[panelist[i]]
                + shift[descriptor[i],panelist[i]]-meanshift[descriptor[i]]
                + sitting[session[i],panelist[i]]-meansit[session[i]];
            sigma[i] <- sqrt(varr
                *exp(lpanelistvar[panelist[i]]-mlpanelistvar)
                *exp(ldescriptorvar[descriptor[i]]-mldescriptorvar));
            }
    }
    model {
        logsensitivity ~ normal(0,0.1);
        for (i in 1: ndescriptor) {
            shift[i] ~ normal(0,sigmashift);
            ldescriptorvar[i] ~ normal(0,1);
        }
        for (i in 1:nsession)
            sitting[i] ~ normal(0,sigmasitting);
        for (i in 1:npanelist)
            lpanelistvar[i] ~ normal(0,1);
        y ~ normal(expect,sigma);
    } 
    generated quantities    {
        vector [npanelist] panelistsd;
        vector [ndescriptor] descriptorsd;
        for (i in 1:npanelist) {
            panelistsd[i] <- sqrt(exp(lpanelistvar[i]));
        }
        for (i in 1:ndescriptor) {
            descriptorsd[i] <- sqrt(exp(ldescriptorvar[i]));
        }
    }
        '

Additional code

This is just the stuff needed to get Stan running
datainfries <- with(fries,list(
                panelist=(1:nlevels(subject))[subject],
                npanelist=nlevels(subject),
                session=(1:nlevels(Session))[Session],
                nsession=nlevels(Session),
                product=(1:nlevels(Product))[Product],
                nproduct=nlevels(Product),
                descriptor=(1:nlevels(Descriptor))[Descriptor],
                ndescriptor=nlevels(Descriptor),
                y=Score,
                nobs=length(Score),
                maxy=15))

pars <- c('profile','sensitivity','shift','sitting',
        'panelistsd','descriptorsd')
library(rstan)
fitfries <- stan(model_code = model1,
        data = datainfries,
        pars=pars,
        iter = 1100,
        warmup=100,
        chains = 4)

Results

Traceplot is not shown, but the general plot provides some overview.
Plot of the profile plot has been adapted given the design. This plot should convey the message the data contains. As time progresses the flavor becomes more rancid and pointy, less potato. This process starts at circa 4 weeks.Treatment 3 seems least affected, but the difference is minute.

samples <- extract(fitfries,'profile')$profile
nsamp <- dim(samples)[1]
profile <- expand.grid(Product=levels(fries$Product),
        Descriptor=levels(fries$Descriptor))
profile$des <- as.numeric(profile$Descriptor)
profile$prod <- as.numeric(profile$Product)
profs <- as.data.frame(t(sapply(1:nrow(profile),function(i){
                        subsamples <-c(samples[1:nsamp,

                                         profile$prod[i],profile$des[i]])
                        c(mean=mean(subsamples),quantile(subsamples,c(.1,.9)))
                        })))
names(profs) <- c('score','lmin','lmax')
profile <- cbind(profile,profs)
profile <- merge(profile,
           unique(fries[,c('Product','treatment','time')])
           )

library(ggplot2)

profile$timen <- c(1:12)[profile$time]
profile <- profile[order(profile$timen),]

p1 <- ggplot(profile, aes(y=score, x=timen,color=treatment))

p1 + geom_path() +
        scale_x_continuous(
                breaks=1:nlevels(profile$time),
                labels=levels(profile$time)) +
        xlab('') +
         geom_ribbon(aes(ymin=lmin, ymax=lmax,fill=treatment),
                 alpha=.15,linetype='blank') +
        facet_grid(  Descriptor ~ . )

3 comments:

  1. Hi Wiekvoet, I am a newbie and have litte knowledge of R, when i tried to copy your code and implement in R, it says "Error: unexpected '{' in "model1 <- data {""... can you give any information regarding this.... Thanking you

    ReplyDelete
    Replies
    1. Actually, if I put it on one line, it says: model1 <- ' data {
      Notice the quote between the arrow and the keyword data.
      Model1 is a string, which is not executed by R but used as parameter in the stan function from the rstan package. Stan does the heavy lifting in this model, R is just for a bit of pre- and post- processing. Please refer to http://mc-stan.org/rstan.html for information on rstan.

      Delete