csvfix is cool
I have been manipulating some data files in comma separated value format. I have been finding the tool,
csvfix to be extremely cool.
For example, I have recently been working with some COMPUSTAT data. The data files are a few hundred megabytes and have several hundred columns, and hundreds of thousands of rows. I wanted all the rows, but only 30 or 40 of the columns, so I had subsetted the large files into much smaller ones. However, I forgot to get the SIC codes. So, rather than open all the files again, I used csvfix.
csvfix order -fn cusip,SIC *.csv | csvfix unique -o quarterly_cusip_SIC.csv
The first command grabbed the
SIC columns from all the
.csv files in the current directory. The output of this is redirected to the second command which takes the incoming csv data, separates out just the unique rows, then outputs them to the file
Categorised as: Uncategorized