I wrote some code to detect the largest balanced panels in the Compustat database. That code is coming tomorrow. I picked a balanced panel that started in the mid 1980′s for this analysis. I used the data.table package to hold the data, as it allows for keyed data tables that allow extremely fast, in-place subsetting.
For the first plot, the code was as follows.
Subset the data to get just the first time period. Using cut2 from Hmisc library, divide into deciles and label them A..J.
min.date = min(dt.t$date);
dt.tmin = subset(dt.t, date == min.date);
dt.tmin$initialDEC = with(dt.tmin, cut2(book_lev, g=10), labels=1:10);
dt.tmin$initialDEC = factor(dt.tmin$initialDEC, labels = LETTERS[1:10]);
initialDEC is a factor representing the initial decile. The merge does exactly what you think it should, meaning add a column called initialDEC to the the original data set, dt.t, merging on the column cusip.
## prepare to merge
setkey(dt.tmin, cusip);
setkey(dt.t, cusip);
## do the merge
dt.t = dt.t[dt.tmin]
Create a factor called timedec, which is the interaction of the initial decile (initialDEC) and each time period. There should be 10 * (max(dt.t$year) - min(dt.t$year) + 1) distinct values of this factor. Then, set this as they key, and take the means according to this group.
dt.t$timedec = paste(dt.t$date, "--", dt.t$initialDEC, sep="")
setkey(dt.t, timedec)
td.means = dt.t[,list(tdmean=mean(book_lev)), by=timedec]
This unwraps the data. td.means becomes a nice little data.table with 10 * (max(dt.t$year) - min(dt.t$year) + 1) rows.
timedate = strsplit(as.vector(td.means$timedec), "--")
tddf = as.data.frame(matrix(unlist(timedate), ncol=2, byrow=TRUE))
colnames(tddf) = c("DATE", "iDEC");
td.means$date = tddf$DATE;
td.means$iDEC = tddf$iDEC;
td.means$year = extractyear(td.means$date)
Then, use ggplot2 to do the plotting:
p = ggplot(td.means, aes(x=year, y=tdmean, group=iDEC))
p = p + geom_line(aes(colour=iDEC))
p = p + opts(title = expression("Firm leverage at t=0 decile sort"))
p = p + scale_y_continuous('Book leverage')
print(p)