newspaint

Documenting Problems That Were Difficult To Find The Answer To

Using wget to Automate Logging Into Websites

The open-source wget tool is useful for automating website access/scraping. In particular because it can store/retrieve cookies from a file.

# create a name for the cookie jar/file
COOKIE_JAR=/tmp/cookies.$$.txt

# save cookies from homepage access
wget --spider --save-cookies $COOKIE_JAR --keep-session-cookies http://www.smrt.com.sg/

# now submit request using saved cookies
wget -O - \
  --load-cookies $COOKIE_JAR \
  --save-cookies $COOKIE_JAR \
  --keep-session-cookies \
  --header "Referer: http://journey.smrt.com.sg/" \
  --post-data='startlat=1.357348601&startlng=103.9884093&endlat=1.276243657&endlng=103.8545958&routeopt=fastest&start_type=mrt&end_type=mrt&mode=TRANSIT&use_lrt=yes' \
  https://connect.smrt.wwprojects.com/smrt/api/journey/

Note that –spider performs a HEAD request and does not download the response. Options useful for debugging and seeing what is sent/received are -d and -S. For cookies the –keep-session-cookies option is essential to save session cookies (with no expiry time set) to the cookie file.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: