April 9, 2008

require() in php

require() function include and evaluate the specific file.

Use require() if you want a missing file to halt the processing of the page.

Note: include() does not behave this way, the script will continue regardless.

Sample:

<?php

require 'prepend.php';

require $somefile;

require ('somefile.txt');

?>

March 24, 2008

Mathematics for Data Analysis – Part 2

2. Chi-Squared Test

The Chi-Squared Test of Association allows the comparison of two attributes (i.e. qualitative variables) in a sample of data to determine if there is any relationship between them.

The idea behind this test is to compare the observed frequencies with the frequencies that would be expected if the null hypothesis of no association / statistical independence were true. By assuming the variables are independent, we can also predict an expected frequency for each cell in the contingency table. If the value of the test statistic for the chi-squared test of association is too large, it indicates a poor agreement between the observed and expected frequencies and the null hypothesis of independence / no association is rejected.

An example where chi-square test can be used
A 3*3 contingency table is given below relate to 830 professional workers living in India towns and cities, who were interviewed during a survey:

Activity Status Total
Employees Employers Own-account workers
Occupation Group Scientists & Technicians 169 21 140 330
Medical & Health services 83 25 68 176
Teachers 286 10 28 324
Total 538 56 236 830

If we are interested in finding the relationship between activity status and occupation group (i.e. whether these two attributes are independent or not), we calculate chi square statistic based on the difference between expected frequencies and actual frequencies, and then according to the value we reject/accept our null hypothesis.

Coming Up next… Log Linear Analysis

March 24, 2008

A few Inspirational Thoughts

Sir Peter Medawar: (On “The Phenomenon of Man”): Yet the greater part of it, I shall show, is nonsense, tricked out with a variety of metaphysical conceits, and its author can be excused of dishonesty only on the grounds that before deceiving others he has taken great pains to deceive himself.

Albert Einstein: We can’t solve problems by using the same kind of thinking we used when we created them.

Richard Dawkins: Science has no methods for deciding what is ethical. That is a matter for individuals and for societies. But science can clarify the questions being asked, and can clear up obfuscating misunderstandings. This usually amounts to the useful: “you cannot have it both ways” style of arguing.

Granny Weatherwax (auth: Terry Pratchett): Trouble is, just because things are obvious doesn’t mean they’re true.

Albert Einstein: Any man who reads too much and uses his own brain too little falls into lazy habits of thinking.

Carl Sagan: You can’t convince a believer of anything; for their belief is not based on evidence, it’s based on a deep seated need to believe.

Charles Darwin: Ignorance more frequently begets confidence than does knowledge.

Jonathan Swift:  It is useless to attempt to reason a man out of a thing he was never reasoned into.

Richard Feynman: But this long history of learning how to not fool ourselves — of having utter scientific integrity — is, I’m sorry to say, something that we haven’t specifically included in any particular course that I know of. We just hope you’ve caught on by osmosis. The first principle is that you must not fool yourself — and you are the easiest person to fool.

Richard Feynman: It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong.

Stephen Jay Gould: The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.

March 24, 2008

Mathematics for Data Analysis – Part 1

1. Basic Statistics

Statistics helps in collecting, classifying and interpreting data that conveys information.

Simple statistical measures to perform the fundamental analysis are: the mean (what is the data average?), the median (data point that splits the total data into 2 equal parts ), the mode (the variable value with highest frequency), the variance (the spread of the data points from the mean), skewness (data symmetry), kurtosis (height of the data), correlation (variable inter-relationship of data).

SKEWNESS:

SKEWNESS 1

SKEWNESSSKEWNESS

The effect of skew on mean and median:

Here the distribution shows positive skew.The mean is larger than median.

basicstatsimg4.gif

Here the distribution shows negative skew. The mean is smaller than median.

basicstatsimg5.gif

The ultimate goal of every research or scientific analysis is finding relations between a set of variables:

  • The Response variable
  • The Independent variables

Correlation research involves measuring such relations easy.

What is the Correlation?

basicstatsimg6.jpg

Basically correlation is a measure of the relation between two or more variables. There are very many measures of the magnitude of relationships between variables which have been developed by statisticians. The choice of a specific measure in given circumstances depends on the number of variables involved, measurement scales used, nature of the relations, etc. Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a perfect negative correlation while a value of +1.00 represents a perfect positive correlation. A value of 0.00 represents a lack of correlation.
Almost all of them, however, follow one general principle: they attempt to evaluate the observed relation by comparing it to the “maximum imaginable relation” between those specific variables.

Coming Up next… Chi-Squared Test

March 20, 2008

2 paratroopers

Let’s “sri ganesh” (start) off this blog with a simple puzzle.

“2 paratroopers are dropped from a plane. The troopers land on line known to both of them, but they can land anywhere on that line. The troopers have 4 moves: move right, move left, pick up parachute, and put down parachute. Propose an algorithm to guarantee the 2 troopers will meet one another.”

What would be its solution? Scroll below for solution.

And the answer is :

When the troopers land, have both drop their parachutes. Have both move right at a rate X. When a trooper finds a parachute, increase their movement rate and they will “run down” the other trooper.

Do you have any other algorithm ?