How to use a Bash Script to split a CSV file into multiple files with headers
A Bash Script implementing the Unix head, tail, split and cat commands can be used to split a large Comma Separated Values (CSV) or other text file into smaller chunks containing the original file’s header.
In this example, I have a "data.csv" file with the following content:
name,value
item1,20
item2,23
item3,22
item4,12
item5,65
item6,31
item7,43
item8,12
item9,43
item10,12
item11,11
item12,33
item13,33
item14,22
item15,75
Create a new script file (in this example I’ve named it "splitcsv.sh") containing the code below in the above mentioned file’s directory using a plain text editor such as nano, vim or TextEdit:
#!/bin/bash
# check if an input filename was passed as a command
# line argument:
if [ ! $# == 1 ]; then
echo "Please specify the name of a file to split!"
exit
fi
# create a directory to store the output:
mkdir output
# create a temporary file containing the header without
# the content:
head -n 1 $1 > header.csv
# create a temporary file containing the content without
# the header:
tail +2 $1 > content.csv
# split the content file into multiple files of 5 lines each:
split -l 5 content.csv output/data_
# loop through the new split files, adding the header
# and a '.csv' extension:
for f in output/*; do cat header.csv $f > $f.csv; rm $f; done;
# remove the temporary files:
rm header.csv
rm content.csv
Browse to the directory containing the script with a Terminal window and enter (replacing "scriptfile" with the name you gave the script file and "datafile" with your CSV file's name):
sh ./{scriptfile} {datafile}
The split command can also break apart files based on a number of bytes or a specific pattern. For more information, see its man page:
man split