Using Wget to Spider a website

January 5th, 2011 / No Comments » / by MediaBandit Ltd

Wget is a command line tool available natively within all Linux distros and is also available as a download for windows. Through my career i’ve mainly used wget to download xml feeds from Affiliate networks. Until recently i’d not looked into the more powerful features that Wget has to offer.

Below are a number of useful features Wget has in relation to mirroring or spidering a website.

Downloading a webpage using wget

wget http://www.squadify.com/

This will download Squadifys homepage to the current location.

Downloading an entire domain using wget

wget --mirror -p --convert-links -P . --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008092416 Firefox/3.0.3" URLTODOWNLOAD

This will download all pages and corresponding assets to the current location.

  • wget –mirror: turn on options within wget suitable for mirroring.
  • wget -p: download all files that are necessary to properly display a given HTML page.
  • wget –convert-links: after the download, convert the links in document for local viewing.
  • wget -P /path/to/directory/: Save all files and folders to this directory. Note: Use wget -P . to download athe files to the current location.
  • wget –user-agent=”Something Different to basic”: Sets the user agent for the wget request to be something other than the default. This sometimes gets past domains that filter on useragent to reduce people spidering the site

shredded paper uses

December 13th, 2010 / No Comments » / by MediaBandit Ltd

Best Suggestions

Below are some great suggestions what to do with you shredded paper.

Reduce

Reduce te amount of shredding you do and only shred documents that absolutely need shredding. Paper is much easier to reuse or recycle when it’s in one piece, not dozens.

Reuse

Use shredded paper to protect fragile items in the post/storage. Some people use it instead of straw for small animal bedding. Turn it into papier mache creations or into paper kindling logs for a fire/stove (stuff it in toilet roll tubes to get the shape if you’ve not got a log maker). Or perhaps using it to fill piillows and puffes etc.

Compost

Shredded paper can be added to compost heaps – it’s great at adding bulk and is a useful “brown” if you have lots of greens (fresh garden clippings or most kitchen scraps) in there already. Alternately, dig it directly into your garden in the autumn (at manure time).

Recycle

Contact your local council to see if they will collect it for recycling – many councils don’t collect shredded paper though some will collect it with other paper and others might collect it with cardboard.

Understanding Shredded Paper Issues

Some local authorities will not collect shredded paper. The reason being is because although, technically, shredded paper can be recycled, some paper mills cannot accept it for a couple of reasons ;

* the average fibre length paper has decreases with shredding, and thus paper made from it will be weaker, and
* shredded paper can be difficult to handle at the mill and depending on the equipment there, it can cause maintenance problems and fire hazards.

add new disk

November 24th, 2010 / No Comments » / by MediaBandit Ltd

If you want to add a new disc to a linux box you could do with following these instructions

check what disks are available

ls /dev

format the disk

fdisk /dev/hdc
-> options n (new partition),p (primary),1 (1),p (print the table and check it), w (write it)

build a Linux file system

mkfs -t ext2 /dev/hdc1

file system check

fsck -f -y /dev/hdc1

Mount it and mkdir all in one line, if your dir already only use everything after the &&

mkdir /dir_of_choice &&  mount -t ext2 /dev/hdc1 /dir_of_choice

Tags: , ,

find dir size find free space

November 15th, 2010 / No Comments » / by MediaBandit Ltd

This article explains 2 simple commands that most people want to know when admistering a linux box. The commands are how to find the size of a directory and finding the amount of free disk space that on your linux machine.

The 2 commands you would use are

find the directory size is: du

find the free disk space use:  df.

All the information present in this article is available in the man pages for du and df. In case you get bored reading the man pages and you want to get your work done quickly, then this article is for you.

-

‘du’ – Finding the size of a directory

$ du
Typing the above at the prompt gives you a list of directories that exist in the current directory along with their sizes. The last line of the output gives you the total size of the current directory including its subdirectories. The size given includes the sizes of the files and the directories that exist in the current directory as well as all of its subdirectories. Note that by default the sizes given are in kilobytes.

$ du /home/mediabandit
The above command would give you the directory size of the directory /home/david

$ du -h
This command gives you a better output than the default one. The option ‘-h’ stands for human readable format. So the sizes of the files / directories are this time suffixed with a ‘k’ if its kilobytes and ‘M’ if its Megabytes and ‘G’ if its Gigabytes.

$ du -ah
This command would display in its output, not only the directories but also all the files that are present in the current directory. Note that ‘du’ always counts all files and directories while giving the final size in the last line. But the ‘-a’ displays the filenames along with the directory names in the output. ‘-h’ is once again human readable format.

$ du -c
This gives you a grand total as the last line of the output. So if your directory occupies 30MB the last 2 lines of the output would be

10M .
10M total

The first line would be the default last line of the ‘du’ output indicating the total size of the directory and another line displaying the same size, followed by the string ‘total’. This is helpful in case you this command along with the grep command to only display the final total size of a directory as shown below.

$ du -ch | grep total
This would have only one line in its output that displays the total size of the current directory including all the subdirectories.

$ du -s
This displays a summary of the directory size. It is the simplest way to know the total size of the current directory.

$ du -S
This would display the size of the current directory excluding the size of the subdirectories that exist within that directory. So it basically shows you the total size of all the files that exist in the current directory.

$ du –exculde=mp3
The above command would display the size of the current directory along with all its subdirectories, but it would exclude all the files having the given pattern present in their filenames. Thus in the above case if there happens to be any mp3 files within the current directory or any of its subdirectories, their size would not be included while calculating the total directory size.

-

‘df’ – finding the disk free space / disk usage

$ df
Typing the above, outputs a table consisting of 6 columns. All the columns are very easy to understand. Remember that the ‘Size’, ‘Used’ and ‘Avail’ columns use kilobytes as the unit. The ‘Use%’ column shows the usage as a percentage which is also very useful.

$ df -h
Displays the same output as the previous command but the ‘-h’ indicates human readable format. Hence instead of kilobytes as the unit the output would have ‘M’ for Megabytes and ‘G’ for Gigabytes.

Most of the users don’t use the other parameters that can be passed to ‘df’. So I shall not be discussing them.

I shall in turn show you an example that I use on my machine. I have actually stored this as a script named ‘usage’ since I use it often.

Example :

I have my Linux installed on /dev/hda1 and I have mounted my Windows partitions as well (by default every time Linux boots). So ‘df’ by default shows me the disk usage of my Linux as well as Windows partitions. And I am only interested in the disk usage of the Linux partitions. This is what I use :

$ df -h | grep /dev/hda1 | cut -c 41-43

Tags: , ,

remove white space from start of every line

November 9th, 2010 / No Comments » / by MediaBandit Ltd

to remove white space from the start of every line in vim use the following command

:%s/^\s\+//g

uniq ips of failed login attempts from auth.log

November 9th, 2010 / No Comments » / by MediaBandit Ltd

a handy command to help you find all uniq ips from your auth.log or in fact any other log file that contains an ip address and the word failed. The ip regex is cheap and dirty.

tail -f -n 3000 /var/log/auth.log | grep -i failed | egrep -o ‘[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}’ | uniq

Import SQL datafile into my MySQL database

October 28th, 2010 / No Comments » / by MediaBandit Ltd

Q. How can I import a MySQL dumpfile into my database?

A. You can easily restore or import MySQL data with mysql command line.

Mysql command to import sql data:
$ mysql -u username -p database_name < your_old_data.sql

If you do not know the database name and the database name is included in sql dump you can use the following command :

$ mysql -u username -p < your_old_data.sql

Tags: , ,

Tar examples

October 26th, 2010 / No Comments » / by MediaBandit Ltd

Here are some examples of how to use tar to compress and uncompress using 2 different compression algorithms bz2 and gz.

tar options

c – create
z – extract
t – test
v – verbose
f – file.

Compression algorithms

bz2, a high compresion algorithm, but slow
gz less compression than bz2 but faster for creating and extracting.

Creating compressed tar archives

To create a bz2 archive use option j to indicate bz2

tar [cjvf] [filename.tar.bz2] [directory/file]

Example of creating a bz2 tar archive

Lets say you have a direcory of log files within a web application that you wish to tar every night to save on space. The log files in this example will be stored in /var/www/example.com/logs/, we want to take everything within the directory and create a tar.bz2 called logs.tar.bz2 within /var/www/example.com/backups/logs/.

The following example should show how to create a bz2 tar archive


tar cjvf /var/www/example.com/backups/logs/logs.tar.bz2 /var/www/example.com/logs/*

Testing a BZ2 archive was create successfully

To test if we created a bz2 tar archive successfully we need to utilise the t option. The following example will test a tar archive to see if it was successfully created. For consistancy we will use the previously created bz2 tar archive.


tar tjvf /var/www/example.com/backups/logs/logs.tar.bz2

Extracting a bz2 tar archive

Extracting a tar archive is simple when you know how, where we previously used the t option to test a tar file we now use the x option, as in extracting. As before we will use the same example we have throughout this blog post, so we will now try to extract a bz2 tar archive to /var/www/example.com/extracts/logs/


tar xjvf /var/www/example.com/backups/logs/logs.tar.bz2 /var/www/example.com/extracts/logs/

More information on Tar files

One thing i didnt realise about tar archives is that their not like zip files, in that tar files will not automatically compress the files or directories within the archive. I.e. it’s not a zip utility. What it does is combine a collection of files or directories into one file for ease of transferring.

There are many other types of compression, I’ve used bz2 because it offers the best compression for tar files. The potential problem with this is that you need to specifically install bz2 compression on the server. Sometimes you might not be in that position, so for those of you here is, from what i understand, the second best compression for tar files. Namely gzip.

How to create a gzip tar archive

All of the above commands assume bz2 is installed on the server. Sometimes this isnt the case. The next best thing is to create a gzip tar file this has compression capabilities and SHOULD be available for everyone.

The above commands can be converted to use gzip and not bz2 by simply replacing the j option with an z.

Creating a gzip tar archive


tar czvf /var/www/example.com/backups/logs/logs.tar.bz2 /var/www/example.com/logs/*

Testing a gzip tar archive


tar tzvf /var/www/example.com/backups/logs/logs.tar.bz2

Extracting a gzip tar archive


tar xzvf /var/www/example.com/backups/logs/logs.tar.bz2 /var/www/example.com/extracts/logs/

In conclusion with tar examples

This post spoke about creating a tar file, testing a tar file and extracting a tar file.

If you’re looking for the best compression tar archive use bz2, if you’re unable to use bz2 because it hasnt been installed on the server then you should use the next best compression algorithm namely gzip.

Tags: , , ,

stains in tea cups explained

September 7th, 2010 / No Comments » / by MediaBandit Ltd

Ever wondered why tea leaves a stain in the cup only where the surface of the tea was;  And why doesn’t it stain the cup evenly?

Well in hard water areas like london water contains quite a lot of calcium in the form of ‘temporary hardness’ ( calcium hydrogen carbonate Ca(HCO3)2).  When it gets heated the temporary hardness breaks down into calcium carbonate, chalk and water.  During this process the calcium salts bind to the tannins in the tea and form an insoluble precipitate – the scum on your tea.  These particles float to the top of the tea, and then stick to the side of the tea cup. The tannins in the tea are what contain some of the colour of the tea and they stain the chalk, so the scum becomes tea colour and stains your cup!

redmine vs jira / confluence

September 6th, 2010 / No Comments » / by MediaBandit Ltd

I have spent too many hours exploreing up the pros and cons for a project management web application tool that has gantt charts, calendars, wikis, forums, multiple roles, and email notification and more and there are in my opinion 2 main  products, redmine and a combination of jira/confluence etc.

Conclusion (in short for now)

From a startup and resource tight perspective then Redmine is your best bet. Its pretty damn powerfull, easy to configure and free! Granted it isn’t flashly like confluence etc, but honestly does it need to be? I think not. One big advantage of Redmine is it is all in one package, the others needs to bolt together somehow, which is not a task for the fient hearted I can tell you!

http://www.redmine.org/