«

»

Feb
25

Calculate standard deviation using AWK

The standard deviation ? (sigma) is the square root of the average value of (X – ?)2.

In the case where X takes random values from a finite data set x1, x2, …, xN, with each value having the same probability, the standard deviation is

  where mu

Assume we have an input file foo with f.ex. line number in first column and in the second column ($2 in awk) we have the values of interest.

File: foo

1 2
2 3
3 6
4 8
5 11

Use the one of the following awk commandos to calculate the standard deviation

awk ‘{sum+=$2; array[NR]=$2} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))^2);}print sqrt(sumsq/NR)}’ foo

awk ‘{sum+=$2;sumsq+=$2*$2} END {print sqrt(sumsq/NR – (sum/NR)^2)}’ foo

The result is

3.28634

Average

Here you may find how to calculate the average or arithmetic mean using AWK.

Minimum and maximum

Here you may find how to calculate the minimum and maximum values using AWK.

7 comments

  1. Calculate average using AWK says:

    […] you may see how to calculate the standard deviation using AWK. […]

  2. Madia says:

    Yes.
    Am given this code
    #!/usr/bin/awk -f
    {sum+=$3; array[NR]=$3}
    END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}
    print sum/NR, " ", sqrt(sumsq/NR)," ",2*(sqrt(sumsq/NR))}
    and told to Alter the code in the above example so that it calculates the mean and standard deviation for data in columns 3 to 8 and for records 500 to record 1000
    The program should create a table of the data used in the computation and it should store the differences (between actual and mean) in the column following the last column of the input data file.
    Thanx for your help

  3. ashi says:

    very handy script, brilliant

  4. Vinay says:

    I think there is a mistake in the code. You seem to be computing sum (x^2 – mean^2), which is not the same as (x – mean)^2. The correct value of std deviation is 3.6742.

  5. ove says:

    No, I think it is correct. Try this:

    awk ‘{x[NR]=$2; sum+=$2; n++} END{mean=sum/n; for (i in x){ss += (x[i]-mean)^2} sd = sqrt(ss/n); print “SD = “sd}’ foo

  6. enid says:

    Use Excel, the result I got agrees with Vinay’s 3.6742

  7. ove says:

    You should check out the page http://en.wikipedia.org/wiki/Standard_deviation, and if you try that example with the awk-code given here you will get the same result.
    And, the formula given half way down the page (just prior to
    Interpretation and application) is the same as the awk-code.
    So, the awk-code given here is correct.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


*

This blog is kept spam free by WP-SpamFree.