e2gStats Basic Mini-course

Index:

Introduction

Data analysis to go

Once upon a time...before researchers could conceive of having access to a computer dedicated to their work, much less owning one they could carry in their purse or wear on their belt, quick-and-dirty on-site data analysis was of the hand-cranked "back of the envelope" variety -- sometimes literally.

Our objective for the e2gStats Basic software is to facilitate similar immediacy of data insights with less tedious calculation effort and more comprehensive analysis choices. In addition, the software provides a format for learning about (and experimenting with) data analysis techniques one bite-sized experience at a time.

Suggestions for using this mini-course

The mini-course is scenario-based, combining an introduction to data analysis with an introduction to the e2gStats Basic software through your participation in simulated consulting projects. The software is most suitable for supporting data analysis in the social sciences including business and education, particularly when questionnaire-based data gathering is part of the research methodology, and the scenarios have that flavor.

Caveats: The mini-course does not deal with statistical theory, and the explanations of each technique are very brief, so the material is intended to be used in conjunction with conventional text materials in an academic setting or by those with prior data analysis experience. Finally, the questionnaires provided are much too simplified to be used in actual studies, and the data provided are artificial -- not actual survey results.

The documentation system integrated with the software and accessed from the HELP tab has three components: help on the e2gStats Basic software itself (button labeled e2gStats), reference documentation describing statistical terms and techniques (button labeled Analysis) and this mini-course (button labeled Mini-course). Once the help files are loaded, buttons at the bottom of the screen allow quick movement among the components. When the e2gStats or Analysis buttons are touched, their labels are replaced with Context indicating that another button touch will move the help file to a location relevant to the e2gStats component or analysis method currently selected. When the Mini-course is selected, an Example button appears that provides access to an example of the method currently selected in the STATS tab. So, while working with this mini-course, expect to use the tabs to jump among the data analysis, data management and help activities, and the help buttons to access supporting information as you need it.

The exercises illustrated or suggested with each scenario encompass a range of data analysis techniques, from the simplest descriptive statistics to more involved hypothesis testing and model building. If you are using e2gStats Basic in conjunction with an academic course, you may want to perform only those exercises already covered in your classroom activity, then return to try the more sophisticated techniques later. There is an index with links to the examples at the end of this mini-course document if you wish to access one of the data analysis methods directly. The Analysis Reference help provides alphabetical access to data analysis terminology, and may be referred to while using the mini-course without losing your location in the course. Terms that you may want to look up in the Analysis Reference if you aren't familiar with them are displayed in italics.

Scenario 1: Smartphone marketing focus group

[ Top Index ]

[ Examples Index ]

Study scenario/study questions

Your consulting firm has been hired to evaluate the sales potential of a proposed new very low cost smartphone/service combination. You have just finished attending a focus group meeting of current smart phone users held a thousand miles away, and are now on your way home with the data gathered from an exit questionnaire. It's a typical business flight -- you'll occupy seat 97B for the next three hours, the seatback ahead is fully reclined, you've seen all the movies at least three times and lunch has been served (the pretzels are gone). So, to make meaningful use of your agony, you've decided to do a "quick-look" analysis on the exit survey data.

Here are some of the issues you'll want to evaluate using the data:

  1. What are the attributes of the focus group?
  2. How much interest is there in the proposed smartphone/service plans?
  3. What factors influence the level of interest in each of the proposed smartphone/service plans?

Data gathering instrument

This is the exit questionnaire given to the focus group:

1.Your current age:            
2.Your gender:
1. Male
2. Female
3.Your current occupational status:
1. Employed full time (at least 30 hours/week)
2. Not employed full time
4.Your primary use for a smartphone:
1. Cell phone conversation
2. Texting
3. Web access
4. eMail
5. Other applications
5.If the proposed service includes phone, 500 talk minutes and unlimited data, how likely would you be to sign up for the service if it cost $50/month?
1. Very likely
2. Somewhat likely
3. Uncertain
4. Unlikely
5. Very unlikely
6.If the proposed service includes phone, unlimited talk minutes and unlimited data, how likely would you be to sign up for the service if it cost $75/month?
1. Very likely
2. Somewhat likely
3. Uncertain
4. Unlikely
5. Very unlikely
7.Your current (approximate) average monthly smartphone charge including taxes and fees:
$          

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Here's the data in comma separated values (csv) format with variable labels chosen to stay within the maximum length of eight characters. The first row of comma-separated, quotation-mark delimited data provides variable labels, and each remaining row provides the numbered responses to each question from one questionnaire. To get you started more quickly, these data were loaded onto your SD card when the e2gStats Basic software was first run. If you have deleted the file (smartphone.csv), you may reload it from the "Web" option on the Data Tab using the Web address:

expertise2go.com/e2gstats/smartphone.csv

If you delete all of the files on the SD card from the DATA tab, then exit the program and restart it, the sample files will be automatically reloaded. Internet access is not required for this action -- the sample files are incorporated in the software.

"Age", "Gender", "Occup", "Use", "$50", "$75", "$MonCost"
18,1,2,2,4,2,72
20,1,2,1,1,5,64
43,1,2,3,2,5,52
18,1,2,2,3,1,74
28,1,2,1,1,5,51
55,1,1,1,1,5,35
23,1,1,1,2,5,48
36,1,1,3,1,4,56
20,1,1,3,2,4,68
67,1,1,1,2,5,53
19,2,2,3,1,4,71
17,2,2,2,4,1,73
21,2,2,2,2,3,70
60,2,2,4,1,4,50
19,2,2,1,1,4,63
34,2,1,4,1,5,53
50,2,1,4,1,5,48
44,2,1,5,1,4,51
29,2,1,4,2,5,49
52,2,1,4,1,5,37

"Quick look" activities and examples

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Descriptive statistics: A typical starting point for evaluatjing questionnaire data uses descriptive statistics methods to obtain a tabular or graphical representation of population attributes.

The scale on which variables are measured determines which descriptive techniques are appropriate. You've decided to start your quick look by selecting the METHOD: Sample Mean, Std Dev, Range, Median option under the OBJECTIVE: Descriptive Statistics heading. This method is suitable for data measured on an interval or ratio scale. Touching the Parameters for this METHOD button opens the parameter dialog, from which you can select the smartphone.csv data file and variables X1/Age and X7/$MonCost variables for analysis by touching the Select Variable(s) button then touching the check marks for the desired variables:

Touching Start Analysis produces the following output (shown in landscape orientation to avoid scrolling):

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Several of the descriptive analysis methods are suitable for data measured on a nominal or ordinal scale, as well as the interval or ratio scales and these have similar input parameters. As an example, you might want to use a pie chart to look at the distribution of responses among the focus group's current uses for a smartphone (the X4/Use variable). Parameters for the pie chart method are entered using the following dialog on which the data file and analysis variable have already been selected. Leaving the upper bound of the class and class interval width blank lets the software choose these values. You can always return to the parameter screen to change them after examining the output. The parameter dialog may be scrolled vertically to move the data entry fields out of the way of a virtual keyboard:

Here is the resulting pie chart, zoomed out slightly to view the chart and legend without scrolling.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

To send the output to another location (to get a printed copy, for example) touch the Share Results button and a list of apps on your Android device that can send the output as an attachment will appear. E-mail programs are the best choice, and you will need to enter an e-mail address and optional title and message to complete the action.

Other descriptive statistics methods with parameter inputs identical to pie charts include tabular Frequency Tables and Histograms. You may want to experiment with these using some of the variables describing the focus group's demographic (population) attributes.

As a final descriptive technique example, you could examine the relationship between respondents' ages and their current monthly smartphone expense by plotting X1/Age against X7/$MonCost with a scatter plot. The parameters that must be entered for this method are the data file name and the X (horizontal) and Y (vertical) axis variables which should be at least ordinal scaled for the plot to make sense:

The plot units on e2gStats Basic plots are standard deviations measured from the mean. Here is the result of the X1 vs. X7 scatter plot zoomed out for viewing without scrolling, suggesting that monthly smartphone expenditures decrease with the age of the user:

If multiple data points fall at the same location on the plot, special symbols (shown in the legend at the top of the plot) are used to represent coincident values.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Hypothesis testing: Next, you might want to look at whether there is significant interest in either of the proposed smartphone budget plans using a single mean test. Question 5 asks how likely the focus group member would be to sign up for the $50/month plan. A response of 1 represents very likely, a response of 3 uncertainty and a response of 5 very unlikely. If you are willing to treat these ordinal values as interval scaled, you could look for a statistically significant low value as evidence of interest in the plan. The null hypothesis, that you would hope to reject, is that the mean value of the population is greater than or equal to three. Here are the parameters input for this test:

If you have knowledge of the population value of the standard deviation, enter it in the standard deviation field and the test will be based on a normal statistic. Otherwise a sample standard deviation will be calculated and a Student's t-test accomplished. You may also specify the confidence level for a confidence interval: 90% is the default value.

To simplify the input, e2gStats Basic always presents the results of the left, right and two-tailed tests, even though only one of these results is relevant to a specific analysis. In this case, you are interested in the case where the null hypothesis is μ >= 3. This hypothesis can be rejected at a significance level of .000, suggesting significant interest in the plan.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

You can use the sign test as a nonparametric equivalent to the single mean test. The input parameters match those for the single mean test, substituting a hypothesized median for the hypothesized mean:

The output is very similar to that for the single mean test and is based on the number of values above and below the median with matches eliminated from the analysis -- an uneven number of values above and below the median is evidence against the null hypothesis.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

To provide the next example, let's suppose you'd like to compare the interest in the $50/month plan between populations represented by male and female members of the focus group. This question suggests a two means test, and requires definition of two subsets of the data file representing male and female responses. The appropriate null hypothesis would be that there is no gender-based difference against the two-tailed alternate hypothesis that there is a difference in either direction.

To define the subsets, you will have to move to the DATA tab, load the smartphone.csv data file and load it into the editor (touch the View/Edit Grid or View/Edit CSV buttons). Here is what the data look like in each editor -- you can drag the screen in either editor to see all of the cases and variables:

Touch the subset button, then define subsets of the data file corresponding to values 1 and 2 (male and female) of the X2/Gender question:

You may also change the missing value that causes a case to be rejected on this screen. It is set here to its default zero value that never occurs as a legitimate value in the smartphone data file. Returning to either the View/Edit or Select File dialogs saves the edited subset data. If the missing value has been changed, a dialog offers the choice to replace any missing values stored in the data file with the new value.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Returning to the STATS tab you select the METHOD: Two Means Test from the hypothesis testing objective then touch Parameters for this METHOD to access the following parameter dialog, and enter the values shown:

The hypothesized difference between the means of the two populations you are comparing is zero. Touching Start Analysis produces the following output.

You are interested in the μ12=0 null hypothesis against the μ12<>0 alternate hypothesis. The null hypothesis cannot be rejected based on this test: a significant difference between male and female populations was not supported.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

The two means test just performed assumes the samples are drawn from populations with equal variances. To test this assumption, you could perform a test on the ratio of two variances. If the ratio of variances is significantly different from one, the hypothesis that the variances are equal may be rejected and the assumptions underlying the two means test are suspect. After selecting the Two Variances Test method from the Hypothesis Testing objective and touching the Parameters for this METHOD button, the following parameter dialog is presented:

You again select subsets representing male and female members of the focus group (X2=1 and X2=2). The default ratio is 1.0, so doesn't need to be entered. The results of this analysis:

You are interested in testing the null hypothesis that the ratio of variances in the male and female populations is one (equal variances) against the alternative that it is not equal to one. Based on your analysis, the hypothesis cannot be rejected.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

The two means test compares the means of two populations. There is also a paired means test that compares the means of two variables in the same population. Suppose you would like to test the hypothesis that there is no difference between interest in the proposed $50/month and $75/month plans represented by the X5/$50 and X6/$75 variables. You will be running a single means test using the difference between these two values in each case as the test variable. After selecting METHOD: Paired Means Test as the analysis method, you will enter the following parameters that include specifying the analysis variable and paired variable:

The appropriate null hypothesis is μ=0 and the results of this analysis indicate that there is a significant difference in interest between the two smartphone monthly plans.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Analogous to the sign test's nonparametric equivalence to the single mean test, there is a paired sign test that is the nonparametric equivalent to the paired means test. You will enter parameters similar to those defining the test on paired means:

The test examines the difference between the variable and paired variable in each case, determining whether this difference is above or below the hypothesized median. Too many values above or below represent evidence against the null hypothesis.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

Instead of tests on the mean value of a variable, you might want to perform a single proportion test. For example, suppose your client's marketing department has predicted that at least 60% of current users of smartphones would be somewhat or very likely to sign up for the $50/month service. To test this hypothesis, you can select the METHOD: Single Proportion Test option from the hypothesis testing objective.

The parameters you enter are similar to those for the single mean test. To calculate a proportion, a subset of the cases must be defined that corresponds to cases that satisfy the condition defining the proportion. For this example, you will have to establish a subset through the DATA tab including cases where the response to question 5 (X5/$50) is less than or equal to 2: the "somewhat" or "very likely" to sign up responses (being sure to return the the View/Edit dialog to save the file). The hypothesized proportion is .6 (for 60% of current smartphone users), and the null hypothesis is that the population proportion is less than or equal to .6:

The null hypothesis is rejected at the .011 significance level, supporting the marketing department's assertion of interest among at least 60% of current smartphone users.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

You used the two means test to compare male with female interest in the $50/month plan. If you would like to see if interest in the $50/month option is dependent on how the focus group member uses a smartphone, you could compare the mean response to X5/$50 across responses to X4/Use. To test equality of more than two means, the one-way ANOVA provides an appropriate technique. After selecting the one-way ANOVA method under the Analysis of Variance objective, the following parameters will be entered to use X5/$50 as the dependent variable and X4/Use as the treatment. There are 5 responses to question 4, with the last representing the "other applications" category. This category can be eliminated from the analysis by selecting 1 through 4 as the treatment range.

Here is the result of the analysis, indicating that there is a significant difference in interest in the $50 plan depending on the respondent's primary use for a smartphone.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

The Kruskall-Wallis one-way ANOVA provides a nonparametric equivalent to the one-way ANOVA just examined. After requesting this procedure from the list of nonparametric objectives, you enter equivalent parameters. Note that once again treatment level five (representing the "other applications" response) has been eliminated from the analysis:

The Kruskall-Wallis procedure substitutes the ranks of dependent variable values for the raw values, then calculates the average squared rank for each treatment level. When there are tied ranks, the median rank is substituted for the tied values. A chi-squared test statistic is used to look for significant influence of the treatment levels on the dependent variable.

There is a test statistic correction based on the number of tied values. The Kruskal-Wallis test statistic is well approximated by a chi-square when there are at least five observations for each treatment level. Otherwise, a table of the Kruskal-Wallis statistic should be consulted for the degrees-of-freedom and value shown in the output as χ2.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

To test for more complex relevance of focus group attributes to members' current monthly smartphone expenditures (X7/$MonCost), a two-way analysis of variance method is available. This method allows the influence of two treatments on a dependent variable to be measured. The effect of X2/Gender (male/female) and X3/Occup (employed full time/not employed full time) would be interesting to consider. A requirement of the balanced two-way ANOVA supported by e2gStats Basic is that there be equal numbers of observations in each cell (treatment combination). Fortunately, the focus group was selected to have equal numbers of male and female members, and equal numbers of employed/unemployed members in each group. The parameters defining this analysis are similar to the one-way ANOVA with the addition of a second treatment:

The results follow. Cell counts are output first, and if they aren't the same for all treatment combinations, the analysis is abandoned:

These results suggest that employment status (X2/Occup) influences monthly smartphone expenditures, but there is no evidence of gender influence. And, there is no significance evident for the interaction (the A x B result) between gender and employment influencing the dependent variable.

[ Scenario 1 Index ]

[ Top Index ]

[ Examples Index ]

More quick look questions to consider

1. You could perform a single mean test examining interest in the $75/month plan (question 6) as you have already done for the $50/month plan.

2. Suppose your client believed that women would be more interested in the $50/month plan than men. How would this change the test hypothesis you would use in the example two means test? (Reviewing the discussion of hypothesis testing in the Analysis help reference could be helpful in picking the appropriate test hypothesis.)

3. The one-way ANOVA allows comparisons among more that two means, but nothing prevents using it for two treatment levels as an alternative to the two means test. Rerun the gender comparison of the interest in the X5/$50 offer using the one-way ANOVA. Note that you don't have to define subsets to do the test this way -- using X5/$50 as the dependent variable and X2/Gender as the treatment variable in a one-way ANOVA eliminates this requirement.

[ Scenario 1 Index ]

Scenario 2: Organizational Climate Study

[ Top Index ]

[ Examples Index ]

Study scenario/study questions

Your consulting firm has been contracted to perform an organizational climate study in the research division of a major manufacturing company. As part of this study, a questionnaire evaluating the job satisfaction of professional employees has been administered. The data have been captured and you are now charged with preparing a quick look analysis and briefing for the company's senior management.

Here are some of the areas you need to cover in your presentation:

  1. What is the overall level of job satisfaction of the division's employees?
  2. Is it possible to predict the level of job satisfaction from employee demographic factors?
  3. Is it possible to predict the level of job satisfaction from factors under supervisory or corporate policy control?

Data gathering instrument

Here is the survey administered to employees and the data in .csv format. These data are loaded as the jobsat.csv file when e2gStats Basic is started the first time on your Android device.

1. What is your age?
1. Under 26 years
2. 26 to 30 years
3. 31 to 35 years
4. 36 to 40 years
5. 41 to 45 years
6. 46 to 50 years
7. 51 to 55 years
8. 56 or over
2.What is the highest educational level you have attained?
1. High school or less
2. Bachelor's degree
3. Master's degree
4. Doctorate
3.Are you a supervisor?
1. No
2. Yes
4.I consider myself to be primarily a:
1. Scientist
2. Engineer
3. Manager
5.Which of the following shows how much of the time you feel satisfied with your job?
1. Never
2. Seldom
3. Occasionally
4. About half of the time
5. A good deal of the time
6. Most of the time
7. All the time
6.Choose the one of the following statements which best tells how well you like your job.
1. I hate it
2. I dislike it
3. I don't like it
4. I am indifferent to it
5. I like it
6. I am enthusiastic about it
7. I love it
7.Which one of the following best tells how you feel about changing your job?
1. I would quit this job at once if I could
2. I would take almost any other job in which I could earn as much as I am earning now
3. I would like to change both my job and my occupation
4. I would like to exchange my present job for another one
5. I am not eager to change my job, but i would do so if I could get a better job
6. I cannot think of any jobs for which I would exchange
7. I would not exchange my job for any other
8.Which of the following shows how you think you compare with other people?
1. No one dislikes his job more than I dislike mine
2. I dislike my job much more than most people dislike theirs
3. I dislike my job more than most people dislike theirs
4. I like my job about as well as most people like theirs
5. I like my job better than most people like theirs
6. I like my job much better than most people like theirs
7. No one likes his job better than I like mine
9.My job is:
1. Boring
2. Not challenging
3. Somewhat challenging
4. Challenging
5. Very challenging
10.Are you given the freedom to do your job well?
1. Never
2. Seldom
3. Sometimes
4. Often
5. Always
11.Is your job preparing you for jobs with greater responsibility?
1. Definitely no
2. Probably no
3. Undecided
4. Probably yes
5. Definitely yes
12.In this company, promotions are based primarily on merit.
1. Strongly disagree
2. Disagree
3. Undecided
4. Agree
5. Strongly agree
13.How often are you given feedback from your supervisor about your job performance?
1. Never
2. Seldom
3. Sometimes
4. Frequently
5. Very frequently
14.What kind of influence does your supervisor have on your organization?
1. Very unfavorable
2. Unfavorable
3. Neutral
4. Favorable
5. Very favorable
15.I believe I have a good understanding of the company's goals and objectives.
1. Strongly disagree
2. Disagree
3. Undecided
4. Agree
5. Strongly agree
16.I am kept informed of company policies which affect me.
1. Strongly disagree
2. Disagree
3. Undecided
4. Agree
5. Strongly agree

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Here's the data in comma separated values (csv) format with variable labels chosen to stay within the maximum length of eight characters. The first row of comma-separated, quotation-mark delimited data provides variable labels, and each remaining row provides the numbered responses to each question from one questionnaire. To get you started more quickly, these data were loaded onto your SD card when the e2gStats Basic software was first run. If you have deleted the file (jobsat.csv), you may reload it from the "Web" option on the Data Tab using the Web address:

expertise2go.com/e2gstats/jobsat.csv

If you delete all of the files on the SD card from the DATA tab, then exit the program and restart it, the sample files will be automatically reloaded. Internet access is not required for this action -- the sample files are incorporated in the software. Again, the first row of data representing variable labels is word wrapped:

"AGE", "EDLVL", "SUPERVIS", "PROFESS", "JOBSAT1", "JOBSAT2", "JOBSAT3", "JOBSAT4", "CHALLENG", "AUTONOMY", "GROWTH","EQUITY", "FEEDBACK", "INFLUENC", "KNOWGOAL", "KNOWRULE"
3,4,2,3,6,6,7,5,5,4,2,4,4,3,4,5
4,4,2,2,6,5,4,6,4,3,5,3,4,5,4,4
3,3,2,2,5,4,6,5,5,3,4,2,5,4,3,4
4,2,1,2,6,5,5,4,4,3,5,5,5,5,5,5
4,2,2,2,2,3,3,2,3,2,1,5,5,1,2,4
1,1,1,1,1,1,3,1,2,2,1,5,4,1,4,4
4,3,1,1,4,1,2,2,1,3,1,3,3,1,4,4
1,3,2,3,5,5,5,3,4,2,2,2,3,3,2,2
6,4,2,2,6,4,5,6,4,3,3,3,3,4,4,5
4,2,1,2,7,5,6,6,5,5,1,2,3,1,4,3
8,4,2,2,6,7,7,5,5,3,2,1,3,2,2,3
1,1,1,1,1,2,3,1,2,1,3,5,5,5,3,3
5,4,2,1,4,1,2,2,4,4,1,3,3,5,2,2
5,1,1,1,2,1,4,2,4,2,2,5,5,1,3,3
1,2,1,2,5,5,6,5,3,2,3,1,1,1,2,2
6,4,2,3,7,7,7,7,5,5,2,5,3,2,2,3
6,2,1,1,6,3,5,5,4,1,3,1,1,1,3,2
2,2,1,2,7,5,6,4,4,5,4,1,4,3,4,2
6,2,2,3,7,7,7,7,5,5,5,2,2,2,1,1
7,2,1,1,5,1,5,5,4,5,1,5,5,5,2,1

"Quick look" activities and examples

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

To support your analysis of the job satisfaction data, you need to add several variables to the data file. These will add a composite job satisfaction score, a variable to hold residuals calculated in the multiple regression routine and dummy variables used to encode a nominal scaled variable. To create the new variables, move to the DATA tab, select the jobsat.csv data file from the sd card files, touch Load Data File then touch View/Edit CSV or View/Edit Grid. Either the grid or CSV editor could be used for these examples: we've used both editors to provide examples of the same data file manipulated in both formats.

The heart of this organizational climate survey is the Hoppock Job Satisfaction index represented by questions 5 through 8: X5/JOBSAT1 to X8/JOBSAT4. The responses to these four questions must be summed to create the overall job satisfaction measure that will be an important dependent variable in your analyses. The steps you will accomplish to create a new variable representing the sum of variables X5 through X8 follow:

1. Your first step is to touch Subset and add a description that will be displayed as part of the analysis output headings. The default missing value in e2gStats Basic is zero. While this default value is well suited for many studies, you will need zero to be a legitimate value to support the use of dummy variables in the analyses you want to perform with this data file. The subset screen is used change the missing value to -99.9, a value that is conspicuous when the data file is examined in one of the data editors. When new variables are added to the data file, they are initialized to the current missing value.

Touching the View/Edit button returns to the data editor.

2. Next, create a new variable to hold the sum: To add the new variable after the last current variable, touch the Select X to delete/add spinner, select X16/KNOWRULE then touch the Add Var button on the dialog that pops up. A progress dialog will be shown while the data file is reformatted.

3. If you are using the CSV editor, a blank delimited with quotation marks is inserted at the appropriate location (the end) of the first row representing label values in the displayed .csv file, and you need to edit this to the JOBSAT label, then touch Save File to save the altered data. With either editor you will have to drag the display to the left to access the new variable's label:

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

4. Next you will touch the Calc Var button and move to the calculation dialog to sum the components of the new X17/JOBSAT measure. A sequence of 3 calculations produces the desired result, and you must touch the Calc button after setting up each calculation:

Finally, to make sure the calculations have been performed correctly, you can touch the View/Edit button to return to the edit dialog and examine the modified data file. Here is what the data will look like in the grid editor:

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Since the job satisfaction measure is so important in this study, you could begin your quick look by examining a correlation matrix including all of the job satisfaction variables. Selecting Correlation Matrix from the Correlation, Regression, Multivariate Analysis objective, then touching Parameters for this METHOD leads to the parameter screen:

The Select variable(s) button accesses a popup dialog used to request variables X5 through X8 for analysis. Touching Start Analysis produces the following correlation matrix that might have to be scrolled, zoomed or reoriented to view all of the values:

The format of the correlation matrix output includes the correlation coefficient followed by the significance level of the correlation -- the α level at which the null hypothesis that the correlation is zero could be rejected. As expected, these variables have statistically significant correlations, but they are not perfectly correlated: they measure slightly different aspects of the job satisfaction construct.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

The nonparametric analog to the correlation coefficients calculated by the correlation matrix method just described is the rank correlation method. In e2gStats Basic, these correlations are calculated for one pair of variables at a time.

A rank correlation is calculated by replacing the value of each variable with its ranking from smallest to largest. Tied values are replaced with the median rank of the ties. Testing the null hypothesis that the correlation between education (X5/EDLVL) and overall job satisfaction (X17/JOBSAT) is zero against the alternate that it is not equal to zero (two-tailed test), you can reject the null hypothesis at the α = .035 level.

When there are ties in the rankings, a tie corrected correlation coefficient and test statistic, along with the adjusted hypothesis testing output, is provided.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Your consulting firm has used the Hoppock job satisfaction measure in many organizational studies, typically finding that the variance in the composite measure (X17/JOBSAT in your current data file) is about 50. To see if this organization provides similar results, you will perform a single variance test examining the null hypothesis that σ2=50 against the two sided alternate that it differs significantly from 50. You are also interested in examining a 95% confidence interval on the variance. From the hypothesis testing objective the single variance test is selected, then touching Parameters for this METHOD leads to the following parameter entry dialog:

The null hypothesized value of the variance (50) and the desired confidence level for the confidence interval (95) are both entered, then Start analysis is touched to yield the following result:

From this output you are unable to reject the hypothesis that the job satisfaction measure (X17/JOBSAT) is from a population with a variance that, as in past studies, is 50.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Multiple regression analysis provides a way to look at the effect of multiple independent variables on a dependent variable. In addition to evaluating the statistical significance of each of the independent variables, regression allows measurement of the influence of a variable beyond the effect of other variables -- "controlling" for these influences.

Both simple (one independent variable) and multiple (more than one independent variable) techniques are supported by the same e2gStats Basic method. If a single independent variable is specified, the regression coefficients are immediately estimated and output. If multiple independent variables are specified on the parameter screen, an interactive stepwise regression is performed: you select a variable to add to or delete from the model on each cycle, producing a new output with appropriate statistical parameters to aid in deciding which variable to add or delete, and when to consider the model finished.

The multiple regression method also allows the calculation of residuals (the difference between the actual dependent variable value and the value the model predicts) and stores these values in a variable optionally specified in the parameter dialog so they can be plotted using the descriptive statistics scatter plot method. To produce residuals you will first need to create a new variable to store them. To do this, you will load the jobsat.csv data file from the Data tab, then touch one of the View/Edit buttons. To add this new variable at the end of each case in the data file, select X17 from the Select X to delete/add spinner, then touch Add Var on the dialog that pops up.

After the file is reformatted, add RESID as the label for this variable.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

As your next step, you will use the Subset editor to define a subset of the data file that includes only respondents who do not consider themselves to be primarily a manager. This will be employees who provided responses 1 or 2 to the X4/PROFESS question:

As the final step before beginning the regression analysis, you need to define dummy variables to permit inclusion in the model of the nominal scaled variable that a repondent considers to be their primary profession: X4/PROFESS. The three responses to this question are Scientist, Engineer and Manager. Two dummy variables are required to represent three values of a nominal scaled variable, and you choose to code them as follows:

X4X19/D-SCIX20/D-ENG
Scientist10
Engineer01
Manager00

With this coding scheme, each case representing a scientist will be coded with a one in X19/D-SCI and a zero in X20/D-ENG. An engineer's case will be coded X19 = 0 and X20 = 1, and a manager's case will have zeros in both the X19/D-SCI and X20/D-ENG variables. When considered in a multiple regression model with job satisfaction as the dependent variable, the coefficients of the X19 and X20 dummy variables will measure the incremental effect on job satisfaction of being a scientist or engineer rather than a manager.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

The following dialog shows the data file after using the editor to add X19 and X20 as the last variables and editing the new blank variable labels to D-SCI and D-ENG. You need to touch the Save File button after entering the new labels:

Your next step is to define subsets representing scientists and engineers. Touching Subset on the edit screen moves to the subset dialog where you define subsets B and C representing these two groups based on the values of X4:

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

After returning to the View/Edit screen, touch Calc Var and you are ready to set the dummy variable values. Because the missing value (-99.9) has been set in the X19/D-SCI and X20/D-ENG variables, you need to change these values to zero. This is easily accomplished by setting the following parameters, then touching Calc:

A similar calculation is performed next after changing the target variable to X20/D-ENG. Now both variables have zero values in all cases.

You next want to set the value of X19/D-SCI to one in every case where the response to X4/PROFESS was one. This is done by selecting X19 as the target variable, setting C to one, selecting the X4 = 1 subset and touching Calc:

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Finally, you will change the target variable to X20, select the X4 = 2 subset and touch Calc:

The jobsat.csv data file is now ready for analysis with the multiple regression method.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

For the first regression example, you could look at the possible impact of several of the ordinal scaled variables that might be treated as interval scaled on the job satisfaction score. Most of the questionnaire items could be treated in this way. For a start, you will choose to consider X1/AGE, X9/CHALLENG, X12/MERIT, and X13/FEEDBACK. After selecting the Multiple Regression method you move to the following parameter dialog on which you will select X17/JOBSAT as the dependent variable, the non-supervisory subset of data, and X1,9,12 and 13 as potential independent variables. By selecting the X4 <= 2 subset you are restricting the analysis to respondents not considering themselves primarily managers:

After touching Start Analysis the following statistical information about variables not in the equation is presented. X9 appears to be the best candidate for entry into the equation (its coefficient will be significant at α = .002 with X12 a close second choice):

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

To enter X9 into the regression model, you will touch the Next X spinner and select X9/CHALLENG:

After entering X9 you will examine the statistical significance for entry of the remaining variables, noting that their significance with X9 in the model changes due to correlations among the variables -- their influence has been corrected for X9. X12/EQUITY is the next variable you enter:

The variables not in the equation will not make a statistically significant contribution to the model. You might be interested in looking at an alternate model including X13/FEEDBACK instead of X12/EQUITY. To do so, reselect X12 to remove it, then select X13.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Next you will look at a multiple regression model that incorporates the dummy variable coding of the X4/PROFESS variable as X19/D-SCI and X20/D-ENG and will generate residuals in the X18/RESID residuals variable added earlier. Here is your parameter setup:

After three iterations of adding variables to the model, you arrive at the following result:

The dummy variable X19/D-SCI is statistically significant while X20/D-ENG is not, indicating a significant difference in job satisfaction for scientists over managers, but not engineers.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Touching the Params button to leave the multiple regression output screen brings up the following dialog giving you a chance to save the residuals (the difference between the actual value of the dependent variable and the value predicted by the regression model). The residuals are stored in the residual variable selected earlier from the parameter dialog (if no residual variable was specified, this dialog is bypassed):

After returning to the method selection screen by touching Change Method and selecting the scatter plot method, you prepare the following plot of the dependent variable (X17/JOBSAT) against the residual (X18/RESID) hoping to find reasonably random results. Residuals plots can often suggest model modifications or question the statistical assumptions on which the regression model is based.

It is common practice to plot the residuals against the predicted value of the dependent variable rather than the actual value as you have just done. To prepare such a plot, you will need to add a new variable, then use the calculation capability in the data editor to subtract the residual value (X18) from the actual dependent variable value (X17) storing the result in the new variable. This result (X17 - X18) will contain the predicted value and can be used as the X axis plot variable.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Contingency table analysis provides a nonparametric method for evaluating the strength of the relationship between two variables when one or both is nominally scaled. To examine the relationship between supervisory status (X3/SUPERVIS) and how the respondents view themselves (X4/PROFESS) you can select the contingency table method from the nonparametric objective, then enter the following parameters:

The analysis uses a chi-square test of significance, and the table should have an expected cell count of at least five for each cell. Your example, while indicating a statistically significant dependence between the two variables, does not satisfy this requirement -- a problem caused by the small number of responses to the questionnaire:

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Principal Component Analysis is one of a collection of methodologies that facilitate understanding the underlying interdependence or structure of a set of "manifestation" variables on which data have been gathered. Although in a strict sense principal component analysis is not factor analysis, the technique is generally introduced (and often implemented in software) under the factor analysis umbrella.

In your organizational climate study, it is reasonable to assume that the sixteen questions represent less than sixteen independent dimensions of organizational climate. As an example, you have already summed the job satisfaction questions (Q5 to Q8) and used the result to represent a single job satisfaction construct. The correlation matrix you examined for these four variables showed high intercorrelations, supporting the notion that they measure a single dimension of organizational climate. The principal component technique provides a more rigorous approach to examining data for underlying structure.

Multiple chapters in multivariate analysis texts (as well as entire books) are often devoted to factor analysis methodology, so be aware that the discussion here is very abbreviated and emphasizes generation and interpretation of results with e2gStats Basic, not underlying theory or philosophy.

There are at least two analyses that you might want to do with this this technique as part of your early work with the data. The first represents a more definitive examination of the four job satisfaction questions to see if they seem to represent a single dimension of organizational climate. The second examines a broader range of survey questions, attempting to define multiple dimensions of organizational climate. Touching Parameters for this METHOD after selecting the Principal Component Analysis method results in display of the following parameter dialog on which values have been entered to support the first analysis:

The parameters specified to perform a principal component analysis are similar to those entered to generate a correlation matrix and, in fact, the first step in the analysis is creation of this matrix. The four job satisfaction variables (Q5 to Q8) have been requested from the jobsat.csv data file. You will usually leave the N factors (default=eigen>1) parameter blank or zero when first running the technique, allowing the app to decide how many components to retain. If you don't like the app's default decision, you can return the the parameter screen and change the value.

The principal component technique is computation intensive and may take a few seconds to derive results after touching the Start Calc button when a large number of variables are entered. During the calculations, the parameter dialog screen continues to be displayed.

The first output is the correlation matrix, identical to the output from the earlier correlation matrix example. The next output is the factor matrix and related information. e2gStats Basic follows a common practice in statistical software of labeling the components as factors:

The factor matrix presents the correlation of each input variable with each of the extracted principal components or factors -- the factor loading. The first component explains the largest fraction of the shared variance, the second component the next largest amount and so forth. The final lines in this output present the percent of variance and cumulative percent of variance accounted for by each factor. The hope is that a few factors will explain most of the variance, supporting a simplified explanation of the underlying structure of the data. By default, e2gStats Basic retains factors with eigenvalues greater than or equal to 1. These factors explain at least as much of the total variance in the analyzed variables as does each variable: the variables are standardized, so each has unit variance. The factor loadings for the retained factors are displayed in blue.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

A graphical alternative for deciding how many components to retain is provided by the scree plot, so named because it resembles the "scree" rocks sliding off a hill form at the bottom. The suggested interpretation is to retain factors down to and including the first factor in the nearly straight line that runs through the components with smaller eigenvalues. In this case (and a common occurrence) the scree criterion would suggest keeping one more component than the eigenvalue greater than one rule recommends. If you wish to see the analysis with two retained factors, return to the parameter screen, set two as the value of the N factors... parameter then rerun the analysis. Additional outputs produced when two or more factors are retained will be explained with the next principal component example.

Two additional outputs are produced in this example. The first presents the communalities for each of the analyzed variables. This is the fraction of each variable's variance explained by the retained factors, and is obtained by summing the squared factor loadings from the factor matrix across the retained factors (just one in this example) for each input variable.

The final output for the example is the table of factor score coefficients. These show how normalized values of the input for each variable would be combined to form a score for each retained factor. Normalized values are obtained by subtracting the sample mean of each variable from each value, then dividing the result by the sample standard deviation. Note that the factor score coefficients are approximately equal in size, supporting the addition of the four Hoppock Job Satisfaction variables to form a single measure of job satisfaction.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

In the next exercise you will input all but the demographic variables to the analysis to see if some basic dimensions of organizational climate are identifiable. Here is the parameter screen setup for this exercise:

In this example, four principal components are retained after applying the eigenvalue >= 1 criterion:

The scree plot suggests that as many as five or six components could be retained. The analysis is automatically completed with four, and you will have to return to the parameter screen if you wish to examine the solution with a greater number of retained components. Only the first nine values are labeled on the scree plot to avoid overlaps when a large number of variables are analyzed.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

Here are the communalities obtained by summing the squared factor loadings for each variable across the four retained factors.

The factor score coefficients would be used to calculate values for each component in each case by summing the normalized values of each variable in the case weighted by its factor score coefficient.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

When more than one component is retained, e2gStats Basic automatically performs a varimax rotation with Kaiser normalization of the retained components. The objective of this mathematical procedure is to make the components easier to interpret. The desired outcome is defined by Thurstone's "simple structure" represented by factor loadings that satisfy the following two conditions: (1) they are either very large or very small in each column (absolute values close to one or zero) and (2) variables with high loadings on one component (absolute value close to one) have low loadings (absolute value close to zero) on all of the other components. These objectives are sometimes conflicting and not possible to achieve, but varimax, which uses the first objective as its criterion, is the most commonly used optimizing technique to seek this simplification.

To interpret the unrotated or rotated factor matrix, you will examine each column looking for loadings close to +/- 1. By examining the survey questions associated with variables with high loadings (high +/- correlations) with the component, you might be able to name the construct the component represents. A reasonable interpretation of these factor scores is provided in the following table:

ComponentHigh loadingsComponent Definition
1X5 to X10Satisfaction with the job
2X12,X13Satisfaction with leadership/supervision
3X15,X16Satisfaction with organizational communications
4X11,X14Satisfaction with job growth

The final output is a table of factor score coefficients for the rotated factors. As with the unrotated factor score coefficients, these could be used to calculate values for each component in each case by summing the normalized values of each variable in the case weighted by its factor score coefficient.

[ Scenario 2 Index ]

[ Top Index ]

[ Examples Index ]

More quick look questions to consider

1. Questions 9 through 16 on the questionnaire (X9/CHALLENG to X16/INFORMED) are all ordinal scaled and you might want to treat these variables as interval scaled and create a correlation matrix to see how they are interrelated. Or, you could calculate nonparametric correlations (rank correlations).

2. The quick look examples so far have not made much use of the X1/AGE variable. You could perform a number of analyses incorporating this variable to examine its influence (if any) on job satisfaction and on various attitudes toward the respondents' feelings about the company's management practices (X9 to X16).

3. Questions 2 through 4 have been used in some of the analyses already accomplished, but you have not fully explored their influence on the issues addressed by X9 to X16.

[ Scenario 2 Index ]

[ Top Index ]

Analysis method examples index

[ Top Index ]


e2gStatsBasic Mini-course Copyright © 2011 by eXpertise2Go.com. All rights reserved.