Assignment #4
Objectives: Students will continue to become familiar with the basic use of dummy variables.
Lab 4 Data
More
info on using dummy variables in SPSS (ignore syntax instructions)
Part One: An Overview of
Dummy Variables
In class, you learned some basic
concepts about dummy variables. You should know the following basic
concepts:
·
For a categorical independent variable with k categories, create k-1 dummies
as follows. Pick one category as the excluded (or "reference" category).
Then assign a dummy to each of the other k-1 categories. Since k
will represent your reference variable, all of the other options are dummies.
The number of dummies = k-1, or, total number of categories minus the reference
category. (In other words, if you have 3 categories, k = 3, so you will
have k-1, or 2, dummies.)
·
For each case, assign the value 1 for the category that matches the observation,
and assign 0 for all other categories.
·
This means that observations that match the reference category will have
values of 0 for every dummy variable.
For example: If the variable is "employment status", with 3 categories [k=3] of "employed," "unemployed" and "retired," this will give 2 dummies [k-1 =3-2=1]. You could pick (arbitrarily) "retired" as the reference category. That leaves "employed" and "unemployed" as the dummies. In assigning values for each case:
An employed person will have "employed" = 1 and "unemployed" = 0.
An unemployed person will have "employed" = 0 and "unemployed" = 1.
A retired person will have "employed" = 0 and "unemployed" = 0.
Part Two: The Data Set
The data for Assignment 4 are for
African countries, from around 1990, although specific dates vary for different
countries.
Click here
for the data set.
PC GDP is per capita gross domestic product, in dollars. GDP is a measure of overall economic activity in the country, and per capita means it is divided by the number of people in the country. In a sense GDP measures the "wealth" of the country. PC GDP will be the DEPENDENT variable.
"Colonial" refers to who ruled the country during the colonial era. For this assignment, colonial powers will be lumped into three categories: France, Britain, Other.
"Geo." is the region of Africa where the country is located: North, West, Central, South.
"% Arable" is the percentage of the country's land which is arable.
Part Three: The Assignment
Answer all parts of the following
questions. Be sure to attach copies of your SPSS output.
1. In a multiple regression, use colonial ruler, region, and % arable to predict pc GDP. Summarize the results.
Hints: In order to do this
step, you will need to first recode the categorical variables into
dummy variables. Here's how:
a.
First enter the data set into SPSS.
b.
You need to recode "Colonial" and "Geo."
c.
To do this, go to the option "Transform" and then select "Recode," followed
by "Into Different Variables."
d.
Choose "Colonial" and then "Name the Output Variable." For simplicity,
use the same names (i.e. France, Britain, Other).
e.
Name the first one "france" and then click "Change." Notice
the command line on the left changes to "Colonial > france."
f.
Now, you need to assign the "Old and New Values" (this is how you recode).
g.
When you click "Old and New Values", a new window will appear. Select
the category you want to use, and recode it as "1." All the other
categories should be coded "0." After you enter the old value and
the new value, click "Add."
h.
Click on continue when you finish recoding all the values, then "OK" to
finish recoding the variable.
i.
Check "Data View" to make sure the variable has been recoded correctly.
j.
Repeat steps b-i until you have recoded France, Britain, and Other.
k.
Recode the variables for "Geo" in the same manner.
Additional hints:
· You will have three new dummy
variables for "Colonial" (France, Britain, Other) and four for "Geo" (north,
south, west, and central).
· The "other" category
must be created and named for Colonial. When you recode, you
can name the new category "other".
· You must recode all
the categories - including "none" and "various." (Remember the "All
other values" shortcut.)
To do the multiple regression, you must run a linear regression with PC GDP as the dependent variable. Use only two of the three recoded Colonial variables (France, Britain, Other), only three of the four recoded Geo variables (west, south, north, central) and % Arable as independent variables.
2. Use a different reference category for colonial and region than you used in (1). Repeat (1) with the new reference categories. Compare results to those in (1).
Hint: Do question 2 by replacing your original reference category with a new reference category -- one you didn't "omit" in the first regression. (For example, if you ran France, Britain, South, West, North and % Arable the first time, you could run France, Other, Central, West, North and % Arable the second time.)
3. From your output, does it seem like there are other important influences on pc GDP that are not in this data set? What do you think might be other independent variables you would want to use to predict a country's pc GDP?
4. Just looking
at the data (not your output), what looks strange about the pc GDP variable?
(One number looks out of place in the list - why might this one be so different?)
Click here for more info on how to use dummy variables in SPSS regression.