Secure Your Sites With Lets Encrypt Free SSL Certificates

SSL certificates are vital to protect communications between the web server and client. We used to buy SSL certificates from Symantec, Verisign, GeoTrust, RapidSSL. The price for SSL certificate will go from low to high for Single domain certificate, Wildcard certificate, and Extended validation certificates respectively. Small startup or Blog owner cannot afford SSL certificates.

But you can get free SSL certificates through Let’s Encrypt which is is a free, automated, and open Certificate Authority. It provides an easy way to obtain and install trusted certificates for free. Lets see how we can get SSL certificate.

Install Let’s Encrypt

In this tutorial, we are using Ubuntu 14.01. First we have to install Let’s Encrypt client tool certbot-auto

wget https://dl.eff.org/certbot-auto
chmod a+x certbot-auto

Getting new certificates

To obtain a cert for acme.com, www.acme.com and blog.acme.net:

./certbot-auto certonly --standalone --email admin@acme.com -d acme.com -d www.acme.com -d blog.acme.net

If you’re running a local webserver for which you have the ability to modify the content being served, and you’d prefer not to stop the webserver during the certificate issuance process, you can use the webroot plugin to obtain a cert by including certonly and --webroot on the command line.

./certbot certonly --webroot -w /var/www/acme/ -d www.acme.com -d acme.com -w /var/www/blog/ -d blog.acme.net -d another.blog.acme.net

Where:

-d = Domain name

-w = Directory in which SSL certificates to be copied

Renew Certificates

Let’s Encrypt certificates will expire in 90 days after creation or renewal. You have to renew the certificates automatically before they expire. You can arrange for automatic renewal by adding a cron or systemd job which runs the following:

./certbot-auto renew --quiet --no-self-upgrade

You can find more detailed information and options in the full documentation.

Random Rationalized

I had a requirement to have random numbers generated in a C application.

1
2
3
4
5
6
7
8
9
10
#include <stdlib.h>
#include <stdio.h>
int main()
{
  int i = 0;
  for (i = 0; i < 5; i++) {
    printf("Random number %d  =  %ld\n", i, random());
  }
  return 0;
}

I ran the code once and saw that random numbers were getting generated.

But to my surprise when I ran the code second time, I saw the same numbers getting printed.

What??? Does it mean random numbers are predictable? If so, why is it called random? Random and Predictable words literally contradicting each other.

But I remember using random.random() in Python which each time gave different output.

1
2
3
from random import random
for i in xrange(3):
  print random()

Each time I executed this code I got different set of output.

Is the random function implementation in libc is so buggy? Of course, cannot be!!!

Oh wait. Are we missing something here.

If there is going to some algorithm for generating random numbers it means that we telling the computer to perform a series of steps and give us a result. If the series of steps is defined it implies that the result is predictable.

But people coming up with such algorithms would have definitely thought about this.

They would have added some variable parameter into the equation making it unpredictable. That variable parameter can be anything like – derived from current timestamp or current data in the network device.

If so, does it mean Python implementors thought of this and libc implementors missed it?

When I checked the man page of random() function in my Linux machine. It said,

The random() function uses a nonlinear additive feedback random number generator employing a default table of size 31 long integers to return successive pseudo-random numbers in the range from 0 to RAND_MAX. The period of this random number generator is very large, approximately

   16 * ((2^31) - 1).

So, this is it. It is not random it is actually pseudo-random.

Meaning of pseudorandom from dictionary,

pseudorandom |ˌsjuːdəʊˈrandəm| adjective

(of a number, a sequence of numbers, or any digital data) satisfying one or more statistical tests for randomness but produced by a definite mathematical procedure. most computers have built-in functions which will generate sequences of pseudorandom numbers.`

Aha. We are there. Whatever we discussed so far is making sense.

Also, there can be a requirement, where, for some experiment which uses random numbers has to be reproducible.

So thats the reason. Why random() function in libc is returning same sequence. The same can be achieved even in Python using seed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from __future__ import print_function
import random

def main():
        for x in xrange(3):
                print("Unseeded random number: {}".format(random.random()))

        random.seed(5)
        for x in xrange(3):
                print("Seeded random number: {}".format(random.random()))


if __name__ == '__main__':
        main()

The output of the above code twice,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[ythulasi@YTHULASI-M-C341 test_random]$ date && python ~/exp/python/test_random/ran.py
Wed May 20 19:45:46 IST 2015
Unseeded random number: 0.399506141684
Unseeded random number: 0.591102825052
Unseeded random number: 0.00522710909826
Seeded random number: 0.62290169489
Seeded random number: 0.741786989261
Seeded random number: 0.795193565566

[ythulasi@YTHULASI-M-C341 test_random]$ date && python ~/exp/python/test_random/ran.py
Wed May 20 19:45:49 IST 2015
Unseeded random number: 0.888964633885
Unseeded random number: 0.000467464175557
Unseeded random number: 0.0425654580676
Seeded random number: 0.62290169489
Seeded random number: 0.741786989261
Seeded random number: 0.795193565566

The same seed returns the same sequence.

So thats the feature of libc, which allows the caller to set same or different seed each time based on the requirement.

Moral of the story is,

  • random numbers generated by computer are only pseudo random numbers.

Seeds used in random numbers are meant for two reasons,

  • to increase the degree of randomness

  • to make experiment that uses random numbers reproducible

Happy hacking!!!

How Github Hide Email From Spam Bots in Profile Page

As a i am a developer, i uses Github a lot. I love it more than any site. I used to see my Github profile page because i feel very happy to see a heat chart with lot of green dots. Since my network is somewhat slow, all the time i could see {email} instead of my email address. I thought it is due to client side rendering. But it does not happen for all other fields like name, location and website. Only email field is rendered in client side.

Why? I google it and I found that it is to avoid Spam Bots to crawl the email address. Basically, The spam bots crawls the site and check for email address and then used them to send ads and other unwanted mails. You might see email address in the site like kumar [at] gmail [dot] com. This method is also used in the past to confuse spam bots.

Lets see how Github might implemented this method. Following is the html code returned from server for Github Profile page(eg: https://github.com/visionmedia)

1
2
3
4
5
6
 <a
     class="email js-obfuscate-email"
     data-email="%74%6a%40%76%69%73%69%6f%6e%2d%6d%65%64%69%61%2e%63%61"
     href="mailto:{email}">
       {email}
 </a>

You can see {email} in the place of email address and another field data-email. fajarkoe explains how it works

1
2
3
4
5
6
7
8
9
10
11
The content of data-email is just the hexadecimal version of your email address
"tj@vision-media.ca".

It is a sequence of hexadecimal characters, where each character is of the
form %XY, where X and Y are hexadecimal digits (0-f). For example,
the first two hexadecimal characters in your case are %66 and %69.
If you look at the ASCII table (http://en.wikipedia.org/wiki/ASCII),
the symbol that corresponds to ASCII with hexadecimal number 66 is "f",
while for hexadecimal number 69 is "i".

You can use play around with this tool http://www.asciitohex.com/.

Once the page is rendered in the browser, hexa decimal value is converted to ascii as follows

1
2
3
4
5
6
7
8
9
10
11
12
function hex2a(hexx) {
  var hex = hexx.toString();//force conversion
  var str = '';
  for (var i = 0; i < hex.length; i += 2)
      str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));
  return str;
}

var $email = $('a.js-obfuscate-email');
var hexaEmail = $email.data('email');
hexaEmail = hexaEmail.replace(/%/g, '')
var email = hex2a(hexaEmail);

Once you get the email, the email is updated in DOM element as follows

1
2
3
var $email = $('a.js-obfuscate-email');
$email.attr('href', 'mailto:' + email);
$email.text(email);

That’s All Folks. Hope this tutorial helped you in understanding method to hide email from spam bots. Please share your thoughts in the comments below.

GCOV - C/C++ Code Coverage Testing Tool

What is GCOV

  • GCC provides GCOV, code coverage testing tool for C/C++ programs.
  • GCOV identifies the lines of code that got executed while running the program.
  • It also gives additional information like how many times particular line got executed.
  • Also provides information about how many possible branches are there in the code and which branch path got executed more.

Use cases

Optimization

GCOV identifies the sections in the code that are heavy executed, using which we’ll be able to focus on optimizing the parts of the code which are executed often.

Identifying dead code

Any code that got compiled but never got executed on any possible scenario can be found using GCOV. Removing such code can help in reducing the memory footprint of the program. This can be vital information for programs running on embedded platforms.

Reliability of testing

The coverage report can help in identifying the gaps in testing.
The coverage information can be used for writing test cases to exercise the uncovered area in the code.

Instrumenting GCOV

GCOV does not require any change in the code. The only requirement is to have the code built with -fprofile-arcs and -ftest-coverage compiler and linker flags.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
yogi@u32:~/gcov_basics$ ls -l
total 8
-rwxrwx--- 1 yogi vboxsf 131 Jun  7 15:13 coverage.c
-rwxrwx--- 1 yogi vboxsf 286 Jun  7 15:56 Makefile
yogi@u32:~/gcov_basics$ cat coverage.c
#include <stdio.h>
int main(int argc, char* argv[])
{
  if (argc == 1)
    printf("True\n");
  else
    printf("False\n");
  return 0;
}
yogi@u32:~/gcov_basics$ cat Makefile
CC=gcc
CFLAGS=-fprofile-arcs -ftest-coverage
LDFLAGS=-fprofile-arcs -ftest-coverage
TARGET=cov
SRC=coverage.c

all:  obj
  $(CC) $(LDFLAGS) *.o -o $(TARGET)
obj:
  $(CC) $(CFLAGS) -c $(SRC)
clean:
  rm -f $(TARGET) *.html *.gc* *.o
gcov:
  gcovr -r . --html -o coverage.html --html-details

CFLAGS += -fprofile-arcs -ftest-coverage

CFLAGS are meant to be used during compilation. This will create .gcno file corresponding to .c/.cpp file

1
2
3
4
5
6
7
8
yogi@u32:~/gcov_basics$ make obj
gcc -fprofile-arcs -ftest-coverage -c coverage.c
yogi@u32:~/gcov_basics$ ls -l
total 16
-rwxrwx--- 1 yogi vboxsf  131 Jun  7 15:13 coverage.c
-rw-rw-r-- 1 yogi yogi    396 Jun  7 15:56 coverage.gcno
-rw-rw-r-- 1 yogi yogi   1824 Jun  7 15:56 coverage.o
-rwxrwx--- 1 yogi vboxsf  286 Jun  7 15:56 Makefile

LDFLAGS += -fprofile-arcs -ftest-coverage

LDFLAGS are meant to be used during linking.

1
2
3
4
5
6
7
8
9
10
yogi@u32:~/gcov_basics$ make
gcc -fprofile-arcs -ftest-coverage -c coverage.c
gcc -fprofile-arcs -ftest-coverage *.o -o cov
yogi@u32:~/gcov_basics$ ls -l
total 36
-rwxrwxr-x 1 yogi yogi   17295 Jun  7 15:56 cov
-rwxrwx--- 1 yogi vboxsf   131 Jun  7 15:13 coverage.c
-rw-rw-r-- 1 yogi yogi     396 Jun  7 15:56 coverage.gcno
-rw-rw-r-- 1 yogi yogi    1824 Jun  7 15:56 coverage.o
-rwxrwx--- 1 yogi vboxsf   286 Jun  7 15:56 Makefile
1
2
3
4
5
6
7
8
9
10
yogi@u32:~/gcov_basics$ ./cov
True
yogi@u32:~/gcov_basics$ ls -l
total 40
-rwxrwxr-x 1 yogi yogi   17295 Jun  7 15:56 cov
-rwxrwx--- 1 yogi vboxsf   131 Jun  7 15:13 coverage.c
-rw-rw-r-- 1 yogi yogi     160 Jun  7 15:56 coverage.gcda
-rw-rw-r-- 1 yogi yogi     396 Jun  7 15:56 coverage.gcno
-rw-rw-r-- 1 yogi yogi    1824 Jun  7 15:56 coverage.o
-rwxrwx--- 1 yogi vboxsf   286 Jun  7 15:56 Makefile

.gcno has static information about the file.

.gcda has dynamic information about the file based on the path taken during execution.

.gcno and .gcda files together are required to generate the coverage report.

Generating Report

Either gcov or gcovr can be used for generating coverage report.

gcov utility will be installed as part of gcc in most of the Linux distributions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
yogi@u32:~/gcov_basics$ gcov coverage.c
File 'coverage.c'
Lines executed:80.00% of 5
coverage.c:creating 'coverage.c.gcov'

yogi@u32:~/gcov_basics$ cat coverage.c.gcov
        -:    0:Source:coverage.c
        -:    0:Graph:coverage.gcno
        -:    0:Data:coverage.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:#include <stdio.h>
        1:    2:int main(int argc, char* argv[])
        -:    3:{
        1:    4:  if (argc == 1)
        1:    5:      printf("True\n");
        -:    6:  else
    #####:    7:      printf("False\n");
        1:    8:  return 0;
        -:    9:}
1
2
3
4
Legend:
  -       indicates not an executable statement
  #####   indicates statement the did not get executed
  1       (or any number) indicates the number of times the statement got executed.

gcovr is a python utility on top of gcov. It can be installed using pip.

1
$ pip install gcovr
1
yogi@u32:~/gcov_basics$ gcovr -r . --html -o coverage.html --html-details

The above command will generate coverage report in html format.

Cool features

  • GCOV takes care of conditional compilation. If a file has 100 lines of code but only 50 lines of code got conditionally compiled using ifdef, then, only 50 lines is taken into account for calculating the code coverage.

  • GCOV when enabled on shared library and called from two different applications, will consolidate the coverage based on execution of both the applications.

  • GCOV works across reboot. The execution information can be collected and consolidated across reboot.

  • Running the same executable multiple instances, appends execution information to .gcda file.

  • Coverage can be collected from different physical machines by copying the executable and .gcda files.

Pointer to ponder

  • The number of lines in the file does not match exactly with the number of lines considered for testing coverage. One reason is, not all lines are statement to be executed. Say, { in a line is not a candidate for execution. The next is could be due to compiler optimization.

  • The version of .gcno file and .gcda file should exactly match to generate report. If the code was compiled again even without any change, report will not get generated as there is mismatch in the version of .gcno and .gcda files.

  • If .gcno file was created again even without changing anything in the code it will not match with .gcda file.

  • The program should gracefully exit to create/append-to .gcda file.

  • If the program is a daemon, better to add exit(0) in the SIGINT and SIGTERM signal handler. to do a graceful exit.

  • GCOV will try to create .gcda file in the same folder structure as it was compiled. But the problem could be when used on embedded platforms, where the filesystem is mostly readonly. In this case, GCOV_PREFIX enviromental variable can be used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
yogi@u32:~/gcov_basics$ make
gcc -fprofile-arcs -ftest-coverage -c coverage.c
gcc -fprofile-arcs -ftest-coverage *.o -o cov

Note: Copying the executable to /tmp folder

yogi@u32:~/gcov_basics$ cp cov /tmp/
yogi@u32:~/gcov_basics$ ls -l
total 36
-rwxrwxr-x 1 yogi yogi   17295 Jun  8 07:30 cov
-rwxrwx--- 1 yogi vboxsf   131 Jun  7 15:13 coverage.c
-rw-rw-r-- 1 yogi yogi     396 Jun  8 07:30 coverage.gcno
-rw-rw-r-- 1 yogi yogi    1824 Jun  8 07:30 coverage.o
-rwxrwx--- 1 yogi vboxsf   286 Jun  7 16:32 Makefile
yogi@u32:~/gcov_basics$ cd /tmp/
yogi@u32:/tmp$ ./cov
True
yogi@u32:/tmp$ find . -name "*.gcda"

Note: gcda file will not get created in the current working directory, instead
will be created in the same folder structure as it got compiled.

yogi@u32:/tmp$ cd -
/home/yogi/gcov_basics

Note: gcda file getting created where the code was actually compiled.

yogi@u32:~/gcov_basics$ ls -l
total 40
-rwxrwxr-x 1 yogi yogi   17295 Jun  8 07:30 cov
-rwxrwx--- 1 yogi vboxsf   131 Jun  7 15:13 coverage.c
-rw-rw-r-- 1 yogi yogi     160 Jun  8 07:30 coverage.gcda
-rw-rw-r-- 1 yogi yogi     396 Jun  8 07:30 coverage.gcno
-rw-rw-r-- 1 yogi yogi    1824 Jun  8 07:30 coverage.o
-rwxrwx--- 1 yogi vboxsf   286 Jun  7 16:32 Makefile
yogi@u32:~/gcov_basics$
yogi@u32:~/gcov_basics$ export GCOV_PREFIX=/tmp
yogi@u32:~/gcov_basics$ rm coverage.gcda
yogi@u32:~/gcov_basics$ cd -
/tmp
yogi@u32:/tmp$ ./cov
True

Note: By setting GCOV_PREFIX environmental variable we'll be able to direct
the files to a particular base folder.

yogi@u32:/tmp$ ls -l /tmp/home/yogi/gcov_basics/coverage.gcda
-rw-rw-r-- 1 yogi yogi 160 Jun  8 07:31 /tmp/home/yogi/gcov_basics/coverage.gcda
  • GCOV_PREFIX_STRIP environmental variable can be handy when we are not interested in complete folder structure but to remove certain part of it.
1
2
3
4
5
6
7
8
9
10
11
yogi@u32:/tmp$ export GCOV_PREFIX=/tmp
yogi@u32:/tmp$ export GCOV_PREFIX_STRIP=2
yogi@u32:/tmp$ ./cov
True

Note:  Earlier .gcda file was getting created in /tmp/home/yogi/gcov_basics/ folder.
Now by exporting GCOV_PREFIX_STRIP=2 environmental variable, will strip two levels - /home/yogi/ folder
is stripped off and .gcda file will get create in /tmp/gcov_basics/

yogi@u32:/tmp$ ls -l /tmp/gcov_basics/coverage.gcda
-rw-rw-r-- 1 yogi yogi 160 Jun  8 07:44 /tmp/gcov_basics/coverage.gcda

FAQs

1.undefined reference to __gcov_init

1
2
3
4
5
6
7
8
yogi@u32:~/gcov_basics$ make
gcc -fprofile-arcs -ftest-coverage -c coverage.c
gcc  *.o -o cov
coverage.o: In function `_GLOBAL__sub_I_65535_0_main':
coverage.c:(.text+0xae): undefined reference to `__gcov_init'
coverage.o:(.data+0x24): undefined reference to `__gcov_merge_add'
collect2: ld returned 1 exit status
make: *** [all] Error 1

The reason for this is, -fprofile-arcs and -ftest-coverage where used during compilation(in CFLAGS), but missed during linking (in LDFLAGS).

2..gcda file not getting created as the result of execution.

Check if gcov symbols are there in the binary using strings or nm command.

ldd command will not help because there will not be any extra libraries linked specifically for gcov.

Binary without gcov symbols will look like the one shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
yogi@u32:~/gcov_basics$ strings cov
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
puts
__libc_start_main
GLIBC_2.0
PTRh
UWVS
[^_]
True
False
;*2$"

Binary with gcov symbols will look like the one shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
yogi@u32:~/gcov_basics$ strings cov
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
...
[^_]
True
False
/home/yogi/gcov_basics/coverage.gcda
profiling:%s:Version mismatch - expected %.4s got %.4s
profiling:%s:Overflow merging
profiling:%s:Overflow writing
profiling:%s:Cannot create directory
profiling:%s:Not a gcov data file
profiling:%s:Merge mismatch for %s
profiling:%s:Invocation mismatch - some data files may have been removed%s
function
summaries
profiling:%s:Error merging
profiling:%s:Error writing
GCOV_PREFIX_STRIP
GCOV_PREFIX
profiling:%s:Skip
profiling:%s:Cannot open
...

The reason for this could be -fprofile-arcs and -ftest-coverage CFLAGS were missed during compilation.

3.gcov symbols are seen in the binary but still .gcda file is not getting created.

The reason could be the program did not do a graceful exit.

We’d love to hear more about your experiences with c/c++ code coverage. Please share your thoughts in the comments below.

Deployed My Static Site Before My Friend Finished Peeing

Building and Deploying static sites becomes easier and simpler than never before. I see lot of people buying servers and setting up cdn for deploying sites which may be company sites, personal sites or blogs. I also see people deploying the sites in S3 which is quiter easier than setting up the server.

But there is also way to deploy your static sites simpler than above methods. It is Divshot Hosting. I deployed my personal blog before my friend finished peeing. Lets see how i did it.

Signup for Divshot

Go to http://www.divshot.com and Signup for the service.

Install Divshot client tool

$ npm install -g divshot-cli

Login to Divshot

$ divshot login

Create your app directory and place your static files

$ mkdir app-name
$ cd app-name

Lets say you are going to have your static assets inside public folder as below

app-name/
  public/
    css/
      main.css
    js/
      main.js
    index.html
    about.html
    contact.html

Create divshot configuration file

Once you’re in your new application’s directory, you can initialize a new Divshot application by using the divshot init command. This will walk you step by step through some basic configuration options for your app, then create a divshot.json file and provision your new app.

$ divshot init

It will ask you following information

name: (app-name) app-name
root directory: (current) public
clean urls: (y/n) y
error page: (error.html)
Would you like to create a Divshot.io app from this app?: (y/n) y
Creating app ...
Success: app-name has been created
Success: App initiated

Deploy just like git push

To deploy to the development environment, all you need to do is run:

$ divshot push

Once your app is deployed successfully, You can view your app at: http://development.app-name.divshot.io

You can also deploy in production by

$ divshot push production

(or)

$ divshot promote development production

Once you app is deployed in production environment, You can view your app at: http://app-name.divshot.io

Setting your custom domain

Once your application is ready, you can set custom domain configuration in divshot.

$ divshot domains:add www.myapp.com

You also need to set CNAME record with your DNS provider. Refer Divshot Documentation for more details.

Happy Hosting and Have a nice day.

Backup of Openshift Application

You may want to schedule backups of your openshift application daily, weekly, or monthly. It can be done in two simple steps

1.Create a backup application

First we need to spin up a backup application

rhc app create osbs https://raw.githubusercontent.com/wshearn/openshift-cartridge-osbs/master/metadata/manifest.yml http://tinyurl.com/OpenShiftRedisCart cron --no-git

Once it is created, it will give your username and password. Please make a note of it.

2.Create a backup cartridge

Then you have to add backup cartridge to the application for which you want to take backup.

rhc cartridge add -a <application to backup> -c https://raw.githubusercontent.com/wshearn/openshift-cartridge-osbs-client/master/metadata/manifest.yml

Once it is done, you can login with username and password(that you got in previous step) in the following URL and schedule the backups.

 http://osbs-<your_namespace>.rhcloud.com

Hope it helps. Have a nice day.

Running Python Script From Cron Job in Openshift

Openshift is one of the amazing PAAS service where you can deploy your application in a very simple steps. It also provides Free gear for developer. So you can deploy your hack without providing credit card information for zero dollar.

Our application stack is Python Flask, MongoDB, Angular.js and some CRON scripts. We wanted to deploy our application in some PAAS which has CRON support.

First We choose Google App Engine since it supports CRON jobs. Unfortunately Google App Engine does not allow us to connect external MongoDB database like MongoHQ and MongoLab. Only solution is to create MongoDB instance in Google Compute Engine. It is not feasible for us.

We also play with Heroku which does not work well for us. Finally we move to Openshift to deploy our application. Following are the steps to configure cron scripts in Openshift.

  • Add CRON cartridge in your application
1
rhc cartridge add cron-1.4 -a application_name

rhc is a command line tool to control the Openshift application.

  • Place your cron scripts to your application’s .openshift/cron/{minutely,hourly,daily,weekly,monthly}/ folder. Here is the sample python scripts.
1
2
3
4
5
6
7
8
9
10
#!/bin/bash

echo "************ Cronny Started ***************"
date >> ${OPENSHIFT_DATA_DIR}/ticktock-start.log

source ${OPENSHIFT_HOMEDIR}/python/virtenv/bin/activate
python ${OPENSHIFT_REPO_DIR}/wsgi/crawler.py

echo "************ Cronny Executed ***************"
date >> ${OPENSHIFT_DATA_DIR}/ticktock-end.log

And that’s all there is to it! Have a nice day.

Generators in Python

One of the few obscure feature of python (for the beginners) is Generators. In this post I would like to share few naive questions I had about generators and the answers I got after understanding them.

Question 1: Are generators something like static variables in C? Say, generateFibonacciNumber() is a generator. First time I call generateFibonacciNumber() and iterate upto value 5, the next time I call generateFibonacciNumber() will it start returning from value 8 when iterated?

This could sound like most dumb question on earth, but honestly I had this question having come from C background.

Answer: No. Generators should not be confused with static variables in C. Every time a generator is called it will return a generator object and each has their own state variables. So iterating one generator will not affect the other.

Question 2: Creating multiple generators holding their reference and not iterating through them could potentially lead to memory exhaustion, correct?

Phew. Yet another dumb question.

Answer: Obviously if you are going to hold the reference it is going to eat memory. But, that is not the problem of generators. Let’s take for example, to process a million lines of code we use a generator and a normal function. At any point of time single generator object will hold only state variables and not the memory needed to hold all million lines. But holding the value returned by a normal function is like holding million lines in the memory. Compare holding multiple instances of state variables and multiple instance of million lines – the answer is obvious.

So, what is actually is a Generator?

To understand generator we’ll have to understand what is iterator in python.

In simple terms, iterator is a object having two methods __iter__() and next(). When iterators are using along with looping constructs like for, the __iter__ and next methods are called implicitly.

You can find more information on iterators here.

__iter__() : Returing itself

next() : Returning next item, or StopIteration exception on no further items.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Iterator():
  def __init__(self):
    self.i = 0

  def __iter__(self):
    return self

  def next(self):
    if self.i < 10:
      self.i += 1
      return self.i
    else:
      raise StopIteration()

def main():
  iterator = Iterator()
  print type(iterator)
  for i in iterator:
    print i,

if __name__ == '__main__':
  main()

Executing this code will give us output as shown below,

1
2
<type 'instance'>
1 2 3 4 5 6 7 8 9 10

Generator is a special iterator. We will not have to write a class with these methods, instead yield keyword can do all the magic for us.

Now, let us rewrite the above code with generator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def generator():
  i = 0
  while i < 10:
    i += 1
    yield i

def main():
  gen = generator()
  print type(gen)
  print dir(gen)
  for i in gen:
    print i,

if __name__ == '__main__':
  main()

Executing this code will give us output as shown below,

1
2
3
<type 'generator'>
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', 'send', 'throw']
1 2 3 4 5 6 7 8 9 10

Is that all? Is generator just another method to create iterator?

But according to the zen of python,

“There should be one— and preferably only one —obvious way to do it.”

Hey look, dir(generator) is giving something more than what we would expect from a dir(function).

Generator is more than just an iterator

Let us now dwell deep into generator, for which we’ll have to understand yield keyword.

yield as just the literal meaning – relinquishes the control temporarily.

Whenever a generator function is called, the actual code inside the function does not get executed.

For example, the same code without iterating through items of generator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def generator():
  print 'First line of generator'
  i = 0
  while i < 10:
    i += 1
    yield i

def main():
  print 'Before calling generator'
  gen = generator()
  print 'After calling generator'

if __name__ == '__main__':
  main()

Output of this code will look like,

1
2
Before calling generator
After calling generator

You can notice something. “First line of generator” is not printed when generator is invoked.

This is how, generator is different from other functions. Calling a generator function does not execute any code in the function, instead returns a generator object.

So, when does actual code gets executed?

The actual code gets executed when the next() method is called.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def generator():
  print 'In generator() first line'
  i = 0
  while i < 10:
    i += 1
    print 'In generator() before yield'
    yield i
    print 'In generator() after yield'

def main():
  print 'In main() before calling generator()'
  gen = generator()
  print 'In main() after calling generator()'
  print 'In main() before calling next()'
  gen.next()
  print 'In main() after calling next()'

if __name__ == '__main__':
  main()

Output of this code will look like this,

1
2
3
4
5
6
In main() before calling generator()
In main() after calling generator()
In main() before calling next()
In generator() first line
In generator() before yield
In main() after calling next()

It is clear from the example above, that the code inside generator actually gets executed when next() method is called.

One more thing to be noticed here is, the last line of generator ‘In generator() after yield’ is not getting printed.

The execution resumes from this point when next() method of the generator object is called the next time.

The code snippet below explains this control flow.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def generator():
  print 'In generator() first line'
  i = 0
  while i < 10:
    i += 1
    print 'In generator() before yield'
    yield i
    print 'In generator() after yield'

def main():
  print 'In main() before calling generator()'
  gen = generator()
  print 'In main() after calling generator()'
  print 'In main() before calling next()'
  gen.next()
  print 'In main() after calling next()'
  print 'In main() before calling next() second time'
  gen.next()
  print 'In main() after calling next() second time'

if __name__ == '__main__':
  main()

Output of this code will be,

1
2
3
4
5
6
7
8
9
10
In main() before calling generator()
In main() after calling generator()
In main() before calling next()
In generator() first line
In generator() before yield
In main() after calling next()
In main() before calling next() second time
In generator() after yield
In generator() before yield
In main() after calling next() second time

So, when to use generator and when to use iterable?

Any iterable can be replaced with a generator but converse is not true.

Generators are preferred for two reasons,

  1. When dealing with large sequence
  2. When the end point of the sequence is not known beforehand

Thats all about generators in Python. Happy hacking.

Must Watch Videos for Entrepreneurs From Guy Kawasaki

I recently watched videos about entrepreneurship and startup from Guy Kawasaki. These videos are fun to watch and so must lessons to learn.

Video 1: Guy Kawasaki: The Top 10 Mistakes of Entrepreneurs

Video 2: Guy Kawasaki “The Art of the Start” @ TiECon 2006

Video 3: 12 Lessons Steve Jobs Taught Guy Kawasaki

Hope you will enjoy the video and Have a nice day.

Want to Automate Things? Have a Glance on Sikuli!

You want to automate some repetitive tasks in daily usage of applications or web pages, games or IT systems and networks etc., and you do not have adequate tools in hand.

Then you are at the right place now. Just give a try for Sikuli, a simple tool for GUI automation. Sikuli can automate any computer operations based on screen shots.

Sikuli Installation steps

Step 1: Download the following from https://launchpad.net/sikuli/+download

Sikuli-1.0.1-Supplemental-LinuxVisionProxy.zip (md5)
sikuli-setup.jar (md5)

Step 2: Install dependencies

Ubuntu users: Install the below packages (Dependencies for Sikuli)

sudo apt-get install openjdk-7-jdk
sudo apt-get install libopencv-dev
sudo apt-get install libtesseract-dev

Unzip the .zip file and follow the readme steps to build a new libvisionproxy.so file.

Windows users: Install java jre 7

Step 3: Install Sikuli using the jar file.

Run the Sikuli-setup.jar by setting the executable bit if not set on it and use the command:

java -jar sikuli-setup.jar

Ubuntu users should replace the newly generated libvisionproxy.so with the existing one in <Sikuli_installed_folder>/libs/ location.

Step 4: Launch the Sikuli IDE using the command.

cd <Sikuli_installed_folder>
runIDE.cmd (for windows users)
./runIDE (for Ubuntu users)

Step 5: Run the Sikuli tests from command line using the command:

runIDE.cmd –r <sikuli_test_name.sikuli>    (for windows users)
./runIDE –r <sikuli_test_name.sikuli>      (for Ubuntu users)

Sample Sikuli code snippet:

Say you want to automate the following: launch a notepad, type some text and save it.

Below is the sample code written in Sikuli IDE.

Sikuli

The same code snippet trying to open a diffent app say gedit, should work seamlessly in Ubuntu. For web browser based automation, wait for my next blog on Selenium.

Happy automation folks !!!