This is a website by Willem van Zyl

I'm a project manager, software developer, Apple evangelist and geek from South Africa. I'm passionate about web and mobile application development, usability, productivity, physics, astronomy, science fiction and fantasy.

If you would like to contact me, message me on Twitter or send me an email.

How to use a Bash Script to split a CSV file into multiple files with headers

27 Feb 2009

A Bash Script implementing the Unix head, tail, split and cat commands can be used to split a large Comma Separated Values (CSV) or other text file into smaller chunks containing the original file’s header.

In this example, I have a "data.csv" file with the following content:

name,value
item1,20
item2,23
item3,22
item4,12
item5,65
item6,31
item7,43
item8,12
item9,43
item10,12
item11,11
item12,33
item13,33
item14,22
item15,75

Create a new script file (in this example I’ve named it "splitcsv.sh") containing the code below in the above mentioned file’s directory using a plain text editor such as nano, vim or TextEdit:

#!/bin/bash

# check if an input filename was passed as a command
# line argument:
if [ ! $# == 1 ]; then
  echo "Please specify the name of a file to split!"
  exit
fi

# create a directory to store the output:
mkdir output

# create a temporary file containing the header without
# the content:
head -n 1 $1 > header.csv

# create a temporary file containing the content without
# the header:
tail +2 $1 > content.csv

# split the content file into multiple files of 5 lines each:
split -l 5 content.csv output/data_

# loop through the new split files, adding the header
# and a '.csv' extension:
for f in output/*; do cat header.csv $f > $f.csv; rm $f; done;

# remove the temporary files:
rm header.csv
rm content.csv

Browse to the directory containing the script with a Terminal window and enter (replacing "scriptfile" with the name you gave the script file and "datafile" with your CSV file's name):

sh ./{scriptfile} {datafile}

The split command can also break apart files based on a number of bytes or a specific pattern. For more information, see its man page:

man split
Do you like this? Share it:

Copyright © Geekology 2011. All Rights Reserved.

Hosted by Code. Like. Clockwork.