PDA

View Full Version : Regression


flsfnoeraekadad
Mar 11, 2007, 12:57 AM
Does anyone know sources of data sets (time-series, cross-series will do) for simple/multiple regression and analysis?

We are needing data which has at least 30-100 observations/years. If you can help us find data sets with Philippine-related data, it will be much more appreciated. Thanks.

silent yet...
Mar 11, 2007, 01:45 AM
Does anyone know sources of data sets (time-series, cross-series will do) for simple/multiple regression and analysis?

We are needing data which has at least 30-100 observations/years. If you can help us find data sets with Philippine-related data, it will be much more appreciated. Thanks.

sources? BSP, NSCB, NSO, PIDS, SWS, NEDA, BLES. Just try to search them sa yahoo.

ubermensch
Mar 11, 2007, 01:48 AM
Hi!

Try the following. I hope they help. :)
Asian Development Bank- http://www.adb.org/Statistics/default.asp
National Statistics Office- http://www.census.gov.ph/
Financial Forecast Center- http://www.neatideas.com/daily/pse_composite.htm
http://www.oanda.com/convert/fxhistory

Sa finance.yahoo.com, marami ring finance-related data sets which you could probably peruse. Good luck!

flsfnoeraekadad
Mar 11, 2007, 02:03 AM
Thanks thanks. Dun na lang siguro ako maghahanap sa ADB site because the regression analysis is for my econometrics class. Madami kasing economic indicators.

Teka. Ano sa tingin nyo ang mas magandang PC statistics tool? Stata, SPSS, Statistica, o Eviews? :D

JamminTonite
Mar 11, 2007, 02:33 AM
i don't know. i've seen some of the posts of the threadstarter here in the Academe. he sounds like a lasallian version of Thomas.

now, i'm starting to miss tommy boy and the rat. :D

flsfnoeraekadad
Mar 11, 2007, 03:19 AM
Wow. I'm getting a hyperbola result for the regression analysis of the CPI in relation to the purchasing power of the Filipinos (1957-2006). But I'm having doubts about it because the r-squared is kinda low.

math_techie
Mar 11, 2007, 12:47 PM
Thanks thanks. Dun na lang siguro ako maghahanap sa ADB site because the regression analysis is for my econometrics class. Madami kasing economic indicators.

Teka. Ano sa tingin nyo ang mas magandang PC statistics tool? Stata, SPSS, Statistica, o Eviews? :D

have you tried using SAS? Pero I think SPSS would also be good, since medyo madali siyang gamitin, parang excel lang.

silent yet...
Mar 11, 2007, 12:56 PM
Wow. I'm getting a hyperbola result for the regression analysis of the CPI in relation to the purchasing power of the Filipinos (1957-2006). But I'm having doubts about it because the r-squared is kinda low.

hey! what stat software did you employ? some of my friends found "Eviews" very user friendly and not too complicated. but as far as accuracy of stat results is concerned, try using "PHstat" if you have the software. it's more complicated but very consistent and reliable. though it doesn't interpret results, it provides all the necessary statistical results- from the test of significance of the beta coefficients, dependent and explanatory variables to the test of significance of the entire regression equation. "SPSS", on the other hand, is far more complicated purportedly.

flsfnoeraekadad
Mar 11, 2007, 12:57 PM
Hindi pa eh. Eviews at Stata ang ginagamit ko for now.

Anyway, ito yung nakuha ko from Stata. I was shocked by the result of the regression.

http://img86.imageshack.us/img86/4884/untitledfq2.png

cHaSeR
Mar 11, 2007, 09:20 PM
Hindi pa eh. Eviews at Stata ang ginagamit ko for now.

Anyway, ito yung nakuha ko from Stata. I was shocked by the result of the regression.

http://img86.imageshack.us/img86/4884/untitledfq2.png

Stata is a very powerful program. I suggest you stick to using Stata to run your regressions.

I just have a question though. What data did you use to represent Purchasing Power? I surmise you are using annual data, right?

Now, regarding the low r-squared, I think a more important issue that you must address here is not the low "goodness-of-fit" of your estimated regression model, but rather, the significance of your variables. I am thinking that since you are testing the relationship between two variables only (bivariate regression), there may be many other factors that affect variations in your dependent variable which is not solely explained by changes in your chosen lone independent variable. This could be the reason why you have a low r-squared. Just take a look at the results of the t-test and F-test of your regression to give you an idea of the reliability of your estimated regression model. If your variables and your regression model are insignificant as per the results of your t-test and F-test respectively, your regression model may be biased. To test your model for statistical biases, check whether your model manifests any form of autocorrelation, multicollinearity, or heteroskedasticity. If diagnosed with any of the aforementioned, you can subject your model to a number of corrective measures to rid it of these statistical biases.

But one real easy and commonly used way of curing your model from a low r-squared and insignificant variables would be to simply add more independent variables. Just make sure you don't get too carried away with doing so because you might just end up doing "kitchen sink" regression--where you simply add and get rid of variables till you get good results but end up having no intuitive, logical, or theoretical basis to explain the relationship among your variables in the regression.

flsfnoeraekadad
Mar 11, 2007, 09:52 PM
Stata is a very powerful program. I suggest you stick to using Stata to run your regressions.

I just have a question though. What data did you use to represent Purchasing Power? I surmise you are using annual data, right?

Now, regarding the low r-squared, I think a more important issue that you must address here is not the low "goodness-of-fit" of your estimated regression model, but rather, the significance of your variables. I am thinking that since you are testing the relationship between two variables only (bivariate regression), there may be many other factors that affect variations in your dependent variable which is not solely explained by changes in your chosen lone independent variable. This could be the reason why you have a low r-squared. Just take a look at the results of the t-test and F-test of your regression to give you an idea of the reliability of your estimated regression model. If your variables and your regression model are insignificant as per the results of your t-test and F-test respectively, your regression model may be biased. To test your model for statistical biases, check whether your model manifests any form of autocorrelation, multicollinearity, or heteroskedasticity. If diagnosed with any of the aforementioned, you can subject your model to a number of corrective measures to rid it of these statistical biases.

But one real easy and commonly used way of curing your model from a low r-squared and insignificant variables would be to simply add more independent variables. Just make sure you don't get too carried away with doing so because you might just end up doing "kitchen sink" regression--where you simply add and get rid of variables till you get good results but end up having no intuitive, logical, or theoretical basis to explain the relationship among your variables in the regression.

I used data from the BSP. The data was monthly, from Jan 1957 to Dec 2006. Quick math tells that 600 observations were used. The data presented contained CPI and Purchasing Power per one Philippine Peso for the given period.

And diba F=MSE/MSR? I'm getting a value of 506.20. Sobrang laki naman ata.

My regression equation is purchasingpower = 28.20529 - 0.3033602cpi

Anyway ito yung nakuha ko sa Stata. Maybe you can help me diagnose where could I have possibly gone wrong or where could I possibly go wrong

http://img262.imageshack.us/img262/7297/untitledwd4.png

cHaSeR
Mar 11, 2007, 10:34 PM
I used data from the BSP. The data was monthly, from Jan 1957 to Dec 2006. Quick math tells that 600 observations were used. The data presented contained CPI and Purchasing Power per one Philippine Peso for the given period.

And diba F=MSE/MSR? I'm getting a value of 506.20. Sobrang laki naman ata.

My regression equation is purchasingpower = 28.20529 - 0.3033602cpi

Anyway ito yung nakuha ko sa Stata. Maybe you can help me diagnose where could I have possibly gone wrong or where could I possibly go wrong

http://img262.imageshack.us/img262/7297/untitledwd4.png


Well, your variable CPI and your entire regression model is significant as seen by your extremely low (close to zero) p-values for your t-test and F-test. This means your estimated regression model is reliable. Now since you've already established the reliability of your model, the degree of relationship between Purchasing Power and CPI
can best be explained by the r-squared. Your r-squared of approximately 46% tells you that variations in the variable Purchasing Power is explained solely by changes in the CPI only 46% of the time. Why the relatively low r-squared? It is probably because there are other factors (variables) that explain movements in Filipinos' purchasing power. The regression model you tested works on the assumption that it is ONLY CPI that affects the purchasing power of Filipinos, which is, in reality, too stringent an assumption because we all know that there are so many other factors that could affect purchasing power, such as domestic liquidity, relative exchange rates, and to some extent, interest rates just to name a few. The other 54% not accounted for by your regression model is explained by these other factors (even those I have not mentioned). To find out what these other factors are, you may want to add more variables to your regression model. If you obtain significant results in your t-test and F-test as well as attain a higher r-squared than 46% after having added more independent variables, that means the variables you added collectively explain a bigger portion of the changes in Filipinos' purchasing power as opposed to just CPI alone.

Regarding the downward sloping regression curve Stata generated, this is both intuitive and logical since your a-priori expectation is that the relationship between CPI and Purchasing Power is negative. Higher levels of growth in consumer prices should lead to a decline in purchasing power since a general increase in prices should diminish the ability of your Peso to purchase consumer goods (specifically those included in the basket of commodities constituting the CPI) because they have become relatively more expensive.

Now, regarding the shape of your estimated regression line, here is my interpretation: when prices of consumer goods goes up, the purchasing power of Filipinos tends to decline, but at a decreasing rate.

flsfnoeraekadad
Mar 11, 2007, 11:00 PM
Well, your variable CPI and your entire regression model is significant as seen by your extremely low (close to zero) p-values for your t-test and F-test. This means your estimated regression model is reliable. Now since you've already established the reliability of your model, the degree of relationship between Purchasing Power and CPI
can best be explained by the r-squared. Your r-squared of approximately 46% tells you that variations in the variable Purchasing Power is explained solely by changes in the CPI only 46% of the time. Why the relatively low r-squared? It is probably because there are other factors (variables) that explain movements in Filipinos' purchasing power. The regression model you tested works on the assumption that it is ONLY CPI that affects the purchasing power of Filipinos, which is, in reality, too stringent an assumption because we all know that there are so many other factors that could affect purchasing power, such as domestic liquidity, relative exchange rates, and to some extent, interest rates just to name a few. The other 54% not accounted for by your regression model is explained by these other factors (even those I have not mentioned). To find out what these other factors are, you may want to add more variables to your regression model. If you obtain significant results in your t-test and F-test as well as attain a higher r-squared than 46% after having added more independent variables, that means the variables you added collectively explain a bigger portion of the changes in Filipinos' purchasing power as opposed to just CPI alone.

Regarding the downward sloping regression curve Stata generated, this is both intuitive and logical since your a-priori expectation is that the relationship between CPI and Purchasing Power is negative. Higher levels of growth in consumer prices should lead to a decline in purchasing power since a general increase in prices should diminish the ability of your Peso to purchase consumer goods (specifically those included in the basket of commodities constituting the CPI) because they have become relatively more expensive.

Now, regarding the shape of your estimated regression line, here is my interpretation: when prices of consumer goods goes up, the purchasing power of Filipinos tends to decline, but at a decreasing rate.

So walang mali sa regression process?

What possible variables could I add into the mix? Interest rates? Wage rates that were prevalent during the period specified? Crime rates? Population increase rates? GDP growth rates?

cretinous00
Mar 12, 2007, 07:16 PM
How do you know you don't have either autocorrelation or multi-colinearity?

flsfnoeraekadad
Mar 12, 2007, 07:19 PM
How do you know you don't have either autocorrelation or multi-colinearity?

You will run the regression analysis through a series of tests. Hindi ko pa alam kung ano yun exactly kasi hindi pa naituturo sa amin eh.

cretinous00
Mar 12, 2007, 07:34 PM
You'll be surprised at how closely your regression line will fit into historical data but is actually useless in forecasting. ;)

flsfnoeraekadad
Mar 12, 2007, 08:23 PM
I'm having troubles with getting the sum of squares with the SRF. Lol.

faux_ph
Mar 12, 2007, 08:36 PM
I prefer to see the horizontal axis to be in logarithmic scale para yung lalabas na figure ay hindi asymptotic to the primary axes?

flsfnoeraekadad
Mar 12, 2007, 08:40 PM
I prefer to see the horizontal axis to be in logarithmic scale para yung lalabas na figure ay hindi asymptotic to the primary axes?

Jusko po hindi ko alam yan, linear regression lang kami haha. Pero ok lang din, all about regression naman ang thread eh. :lol:

flsfnoeraekadad
Mar 12, 2007, 11:14 PM
Question. How do I distinguish Type 1 and Type 2 error? This is related with hypothesis testing.

Type 1 daw is the "false positive" and Type 2 is the "positive false". How do I interpret it?

silent yet...
Mar 12, 2007, 11:57 PM
How do you know you don't have either autocorrelation or multi-colinearity?

to test if the residuals of the regression are dependent from each another (AUTOCORRELATION), use Durbin-Watson. for more the 15 samples, if the Durbin-Watson statistic lies between the upper and lower critical values, then there's no autocorrelation.

to test if there exist a linear association between the explanatory variables (MULTICOLLINEARITY), use Variance Inflationary Factor (VIF). if the VIF is less than 5, there's no multicollinearity.

neinsager
Mar 13, 2007, 01:45 AM
http://pwt.econ.upenn.edu/php_site/pwt_index.php

yan ang maire-recommend ko.

blueshark
Mar 13, 2007, 05:14 PM
ang bigat ng discussion ** ah... :D

i'm not an econometrician nor a statistician so I just follow the "norms" when doing a regression analysis

norms such as:

-a t value of at least 2.0 (which translates to a low p value)
-high R2 (coffecient of determination) value e.g. 90%
-Durbin Watson value within the 2.25 to 2.75 range
-the expected relationship is met (indirect / indirect) as seen in the value of the coefficients of the independent variables

Of course, looking at the tables for the exact values of like, durbin watson helps when your values are bordering on the lower or higher end of the norm scale.

As a standard like what have been mentioned earlier, checks for errors is a must - for autocorrelation, etc.

*okay*

flsfnoeraekadad
Mar 13, 2007, 05:37 PM
...and of course getting the square root of the R-squared gives us the correlation coefficient. lolz.

cretinous00
Mar 13, 2007, 06:19 PM
On to multiple regression! I find the regression tool in excel sufficient for my needs.

blueshark
Mar 13, 2007, 07:00 PM
yup, excel can be used for regression sans the durbin watson coefficient.

I've tried using EVIEWS and more recently, SPSS.

flsfnoeraekadad
Mar 13, 2007, 07:52 PM
I just realized that I just punished myself when I tried to regress tax on GNP by with 20 data points, and each data point is no less than 10000. I was like reaching for the heavens with the summation of XY values and the summation of x^2 and the summation squared of x values that I am getting. :rotflmao:

blueshark
Mar 13, 2007, 10:32 PM
other impt pts:

make sure that if you use a data-set at its constant or real values for your dependent variable, the independent variables should also be at constant values also. There should be consistency otherwise, garbage-in, garbage-out. This is especially true whenever you use NSCB data such as GNP, GDP, PCE, Prices, etc.



Also, if common sense says that your independent variable X1 is affected by your independent variable X2, then you should choose only one between the two. No sense in forcing both variables in one equation.

*okay*

cHaSeR
Mar 13, 2007, 10:40 PM
Question. How do I distinguish Type 1 and Type 2 error? This is related with hypothesis testing.

Type 1 daw is the "false positive" and Type 2 is the "positive false". How do I interpret it?

Type I Error is committed when the null hypothesis (or the status quo) is rejected in favor of the alternative hypothesis even if the null hypothesis is, in reality, true. Committing a Type I Error in regression analysis can be likened to committing a mortal sin, so to speak. So all efforts must be directed towards minimizing the probability of committing Type I Error. The significance level (or alpha) is the probability of committing a Type I Error, while the p-value of a t-test, on the other hand, is the "true" probability of committing a Type I Error.

Type II Error occurs when a false null hypothesis (status quo) is accepted.

bouncybanana
Mar 24, 2007, 04:05 PM
Where did you get your dataset?

leverage17
Mar 26, 2007, 12:15 PM
Kindly try- BSP, NSCB, POEA,PIDS, and Philippine Statistical Yearbook though they have incomplete and inconsistent datas.

These sources have been relaible to most UP and DLSU Econ majors.