Title: | Mailmerge using R, LaTeX, and the Web |
---|---|
Description: | Provides mailmerge methods for reading spreadsheets of addresses and other relevant information to create standardized but customizable letters. Provides a method for mapping US ZIP codes, including those of letter recipients. Provides a method for parsing and processing html code from online job postings of the American Political Science Association. |
Authors: | Ryan T. Moore <[email protected]> and Andrew Reeves <[email protected]> |
Maintainer: | Ryan T. Moore <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.1-13 |
Built: | 2024-11-16 03:18:24 UTC |
Source: | https://github.com/cran/muRL |
Provides mailmerge methods for reading spreadsheets of addresses and other relevant information to create standardized but customizable letters. Provides a method to map US ZIP codes, including those of letter recipients. Provides a method for parsing and processing html code from online job postings of the American Political Science Association.
Package: | muRL |
Type: | Package |
Version: | 0.1-13 |
Date: | 2023-08-21 |
License: | GPL version 2.0 or newer |
URL: | https://www.ryantmoore.org/software.murl.html |
LazyLoad: | yes |
Ryan T. Moore [email protected] and Andrew Reeves [email protected]
Maintainer: Ryan T. Moore <[email protected]>
Reads American Political Science Association (APSA) “eJobs” html files, parses the content of these files into a format for muRL to read, and writes that content to a .csv file.
apsahtml2csv(directory, file.name, file.ext = ".htm", verbose = TRUE)
apsahtml2csv(directory, file.name, file.ext = ".htm", verbose = TRUE)
directory |
a character string specifying the directory to which a set of APSA job announcement web pages have been downloaded. |
file.name |
a character string specifying the name of the file to which the data should be written. |
file.ext |
a character string specifying the extension of the files from which the data will be harvested. |
verbose |
a logical specifying whether the file name and current working directory should be printed. |
After logging in to eJobs, the job announcement site of the American Political Science Association (APSA), the user can search for and find the APSA web page announcing a single job listing. The user can download the html from several such pages (usually with a simple “Save As” command, depending on one's operating system). apsahtml2csv
then parses the html code from these pages, and sorts and stores the relevant content. A .csv
file is written containing this content.
If the user downloads the APSA webpages using a different (or no) file extension, that extension (or "") should be specified using the file.ext
argument. Because apsahtml2csv
uses the value of file.ext
in a grep
command, we strongly recommend that the directory specified by directory
include only the downloaded webpages, and no other files or directories.
Institutions are inconsistent in how they enter the names of their jobs' contact representatives. Thus, some tweaking of the output of apsahtml2csv
may be required in order to create a .csv
file that can be seemlessly read by read.murl
. Specifically, the user may have to take the single column of the output of apsahtml2csv
called contact
, and create columns called title
, fname
, and lname
. Additionally, the user may have to adjust the position
and subfield
columns, and institutions may report these somewhat differently.
An R dataframe is created and a .csv
file is written. These include columns containing the APSA job listing ID number, the date the job advertisement was posted, the type of institution, the title and subfield of the position, the start date, salary, and region, the name of the institution and department, the name, address, city, state, ZIP code, and phone number of the individual to contact, the department or institution's web address, and a full paragraph description of the position.
The full paragraph description is stored in a column named desc
. Due to the current parsing strategy, this field may include some excess characters from the APSA html page.
Ryan T. Moore [email protected] and Andrew Reeves [email protected]
This sample dataframe of recipient information and addresses includes columns with information required by the muRL
package, as well as auxiliary columns with information related to a hypothetical mailmerge, but not required by muRL
.
data(murljobs)
data(murljobs)
A data frame with 8 observations on the following 15 variables.
institution
a factor containing sample institution names (with levels Christopher College
, ..., University of State University
).
type
an auxiliary factor for sorting sample entries (with level am
).
deadline
an auxiliary factor containing sample deadlines (with levels 1/5/2010
, 12/1/2009
).
title
a factor containing sample recipient titles (with levels Dean
, ..., Sargent
).
fname
a factor containing sample recipient first names (with levels Frank
, ..., Tim
).
lname
a factor containing sample recipient last names (with levels Anderson
, ..., Smithers
).
dept
a factor containing sample recipient information (with levels Department of Political Science
, Department of Politics
).
position
a factor containing sample position titles (with levels assistant professor
, ..., postdoctoral associate
).
subfield
a factor containing sample recipient information (with levels American politics
, ..., Governance Studies
).
address1
a factor containing sample recipient address first lines (with level Graduate Admissions Committee
).
address2
a factor containing sample recipient address second lines (with levels 11 Smith Rd.
, ..., Dept of Political Science
).
address3
a factor containing sample recipient address third lines (with levels 123 Main St
, ..., Dept of Rock Music
).
city
a factor containing sample recipient cities (with levels Allentown
, ..., Topeka
).
state
a factor containing sample recipient states or provinces (with levels CA
, ..., WY
).
zip
a numeric vector containing sample recipient ZIP codes.
Created by package authors.
data(murljobs)
data(murljobs)
Reads a .csv
file or R dataframe of letter recipients, processes the column names for write.murl
, checks whether United States ZIP codes conform to standard formats, and reports potential problems to the user.
read.murl(file = "murljobs.csv", header = TRUE, stringsAsFactors = FALSE, field.title = "title", field.fname = "fname", field.lname = "lname", fields.address = "address", field.city = "city", field.state = "state", field.zip = "zipcode", field.position = "position", field.subfield = "subfield", field.dept = "dept", field.institution = "institution", field.instShort = "instShort", colClasses = c("character"), ...)
read.murl(file = "murljobs.csv", header = TRUE, stringsAsFactors = FALSE, field.title = "title", field.fname = "fname", field.lname = "lname", fields.address = "address", field.city = "city", field.state = "state", field.zip = "zipcode", field.position = "position", field.subfield = "subfield", field.dept = "dept", field.institution = "institution", field.instShort = "instShort", colClasses = c("character"), ...)
file |
the name of a |
header |
a logical for whether the first row of the input file or dataframe is a header row. |
stringsAsFactors |
a logical for whether character strings should be stored as factors with levels. |
field.title |
a character string giving the name of the column containing recipients' titles (such as “Doctor", “Mrs.", etc.). |
field.fname |
a character string giving the name of the column containing recipients' first names. |
field.lname |
a character string giving the name of the column containing recipients' last names. |
fields.address |
a character string common to the name(s) of the column(s) containing recipients' street mailing address information. Each column will be printed as its own row in the mailing address. See Details below for more. |
field.city |
a character string giving the name of the column containing recipients' cities. |
field.state |
a character string giving the name of the column containing recipients' states or provinces. |
field.zip |
a character string giving the name of the column containing recipients' United States ZIP or other postal codes. |
field.position |
a character string giving the name of the column containing recipient-specific information, such as the specific position for which one is applying. |
field.subfield |
a character string giving the name of the column containing recipient-specific information, such as the specific subfield for which one is applying. |
field.dept |
a character string giving the name of the column containing additional information, such as the specific department offering the position for which one is applying. |
field.institution |
a character string giving the name of the column containing additional information, such as the institution offering the position for which one is applying. |
field.instShort |
a character string giving the name of the column containing the shortened version of the name of the institution. Optionally used in the closing. |
colClasses |
a vector of character strings indicating the class of each column. Using |
... |
other arguments to pass to |
Recipients' addresses are formatted for mailing as follows. The first row contains the contents of the fields defined by field.title
, field.fname
, and field.lname
. Each of the fields defined by fields.address
is formatted as a unique row. The last row contains the contents of the fields defined by field.city
, field.state
, and field.zip
.
fields.address
specifies the string common to the names of the columns containing the recipients' street addresses. For example, if the user's file has the street address in columns named addr1
, addr2
, ...
, then the user should set fields.address = "addr"
.
If the input file is an R dataframe, then the argument ...
is ignored.
An R dataframe containing the relevant information for creating a set of standardized but customizable letters to be mailed.
Ryan T. Moore [email protected] and Andrew Reeves [email protected]
## Specify path to .csv database of sample addresses fpath <- system.file("extdata", "murljobs.csv", package = "muRL") murljobs <- read.murl(fpath)
## Specify path to .csv database of sample addresses fpath <- system.file("extdata", "murljobs.csv", package = "muRL") murljobs <- read.murl(fpath)
Reads an R dataframe of letter recipient- and position-specific data, such as the output of read.murl
. Creates a .tex
file of the relevant data and LaTeX code, which can then be processed directly by pdflatex, for example.
write.murl(object, file.name = "mailmerge.tex", salutation = "Dear", sal.punct = ":", address.string = "123 Venus Flytrap Way\\\\Cincinnati, OH 45201\\\\ \\texttt{[email protected]}\\\\ \\texttt{http://www.wkrp.edu/jfever}\\\\513-555-5664", pad_if_zip4 = TRUE, date = "\\today", letter.file = NULL, letter.text = NULL, valediction = "Sincerely,", signature = "Johnny Fever", opening = "", include.opening = FALSE, closing = "", include.closing = FALSE, contact_me = TRUE, margin_geometry = NULL, verbose = TRUE)
write.murl(object, file.name = "mailmerge.tex", salutation = "Dear", sal.punct = ":", address.string = "123 Venus Flytrap Way\\\\Cincinnati, OH 45201\\\\ \\texttt{jfever@wkrp.edu}\\\\ \\texttt{http://www.wkrp.edu/jfever}\\\\513-555-5664", pad_if_zip4 = TRUE, date = "\\today", letter.file = NULL, letter.text = NULL, valediction = "Sincerely,", signature = "Johnny Fever", opening = "", include.opening = FALSE, closing = "", include.closing = FALSE, contact_me = TRUE, margin_geometry = NULL, verbose = TRUE)
object |
a dataframe of mailmerge data, such as an output from |
file.name |
a character string specifying the file name (and optionally, path) for the output |
salutation |
a character string specifying the salutation to be used in the letters. |
sal.punct |
a character string specifying the punctuation to be used at the end of the salutation. |
address.string |
a character string specifying the return address to be used in the letters. Note that two slashes ( |
pad_if_zip4 |
a logical indicating whether to add a leading 0 (zero) to
any ZIP code with 4 or 9 characters. Defaults to |
date |
an optional character string specifying the date. Defaults to the current date. |
letter.file |
an optional character string specifying a file containing the body text of the letters. See Details below for more. |
letter.text |
an optional character string containing the body text of the letters. See Details below for more. |
valediction |
a character string specifying the valediction to be used in the letters. |
signature |
a character string specifying the signature to be used in the letters. |
opening |
a character string specifying the opening line to be used in the letters. See Details below for more. |
include.opening |
a logical indicating whether an opening, customized line is to be used in the letters. See Details below for more. |
closing |
a character string specifying the closing line to be used in the letters. See Details below for more. |
include.closing |
a logical indicating whether a closing, letter-customized line is to be used in the letters. See Details below for more. |
contact_me |
a logical indicating whether a 'contact me' sentence is added to the closing. See Details below for more. |
margin_geometry |
an optional numeric vector of length 4 containing the number of inches of margin for the top, bottom, left, and right of the letter. See Details below for more. |
verbose |
a logical indicating whether the |
The dataframe used by write.murl
should include columns for recipients' titles, first names, last names, addresses, cities, states, and ZIP codes, as well as information specific to the position for which the letter is in application. write.murl
is intended to operate on the output of read.murl
, and thus requires that the column names for the fields above be “title”, “fname”, “lname”, “address1” (and “address2”, etc.), “city”, “state”, “zip”, “position”, “subfield”, “dept”, and “institution”. These field names are automatically created by read.murl
.
The user may define the main body text of the letter in at least three ways. First, write.murl
includes some sample text by default. The user could simply edit this text in the .tex
file created by write.murl
. Second, the user could write the body text in a separate file (such as a .txt
file) and specify that file's name using the letter.file
argument. Third, the user could define the entire body text as a string passed to the letter.text
argument.
If both letter.file
and letter.text
are specified, write.murl
appends the value of letter.string
below the contents of the file specified by letter.file
.
The opening line specified by argument opening
should be of a grammatical form consistent with “I write to apply for the position in”. This phrase will then be followed by customized input, using the fields “position”, “subfield”, “dept”, and “institution”, as in the example in Value below. To omit such a customized opening line, use the default
include.opening = FALSE
. The example in Value below, and thus each letter, will include only the content defined in the LaTeX-defined “body”.
The closing line specified by argument closing
should be of a grammatical form consistent with “I will be an asset to the”. This phrase will then be followed by letter-customized
input, using the fields “position”, “subfield”, and “instShort”. To omit such a customized closing line, use the default include.closing = FALSE
. If contact_me
is TRUE
(the default), then the phrase “Please don't hesitate to contact me if more information would be helpful.” is included.
If the margin_geometry
argument is specified, the LaTeX package geometry
will
be used, and the four components of margin_geometry
provide the margins for
the top, bottom, left, and right of the letters. If it is not specified, the default
margins of documentclass{letter}
are used.
A .tex
file of LaTeX code and recipient-specific content, to be processed directly by LaTeX. Using the included murljobs.csv
sample data, the .tex
file created by write.murl
includes for each position one code snippet that looks like the following:
\begin{letter} {Dr. Richard Sanders\\Graduate Admissions Committee\\123 Hello Way\\Frederick MD 21701} \opening{Dear Dr. Sanders:} \body \closing{Sincerely,} \end{letter}
Ryan T. Moore [email protected] and Andrew Reeves [email protected]
data(murljobs) ## Create mailmerge.tex required for LaTeX import # write.murl(murljobs) ## Specify a file containing the letters' body text ## write.murl(murljobs, letter.file = "mybodytext.txt") ## Specify a string containing the letters' body text # write.murl(murljobs, letter.text = "This is the whole body of my letters.") ## Specify salutation, valediction options (overwrites previous mailmerge.tex) # write.murl(murljobs, file.name = "mailmerge.tex", salutation = "Greetings", # sal.punct = ",", valediction = "Truly Yours,", include.opening = FALSE) ## Specify opening line also (overwrites previous mailmerge.tex) # write.murl(murljobs, file.name = "mailmerge.tex", salutation = "Greetings", # sal.punct = ",", valediction = "Truly Yours,", # opening = "I am applying for the job in", include.opening = TRUE)
data(murljobs) ## Create mailmerge.tex required for LaTeX import # write.murl(murljobs) ## Specify a file containing the letters' body text ## write.murl(murljobs, letter.file = "mybodytext.txt") ## Specify a string containing the letters' body text # write.murl(murljobs, letter.text = "This is the whole body of my letters.") ## Specify salutation, valediction options (overwrites previous mailmerge.tex) # write.murl(murljobs, file.name = "mailmerge.tex", salutation = "Greetings", # sal.punct = ",", valediction = "Truly Yours,", include.opening = FALSE) ## Specify opening line also (overwrites previous mailmerge.tex) # write.murl(murljobs, file.name = "mailmerge.tex", salutation = "Greetings", # sal.punct = ",", valediction = "Truly Yours,", # opening = "I am applying for the job in", include.opening = TRUE)
Using United States ZIP codes, plots on a map the location of letter recipients. State or county boundaries may be displayed.
zip.plot(data, zip.file = system.file("extdata", "zips.tab", package = "muRL"), map.type = "state", cex = 1, col = "black", pch = 20, jitter.factor = NULL, ...)
zip.plot(data, zip.file = system.file("extdata", "zips.tab", package = "muRL"), map.type = "state", cex = 1, col = "black", pch = 20, jitter.factor = NULL, ...)
data |
a dataframe with ZIP codes in a column named ' |
zip.file |
a character string naming a |
map.type |
the type of map for |
cex |
a numerical value giving the amount by which plotting text and symbols should be magnified relative to the default. Accepts, for example, a vector of values which are recycled. |
col |
a specification for the plotting color. |
pch |
the plotting character for |
jitter.factor |
a numeric specifying by how much points should be jittered before plotting. See Details below for more. |
... |
other arguments to pass to |
map.type
can be any valid map from the maps
package. For plotting the location of United States ZIP codes, usa
, state
, or county
should be used.
See help(par)
for more details on cex
, col
, and pch
.
See help(jitter)
for more details on jitter.factor
. zip.plot
jitters latitude and longitude separately using the same factor.
To plot only a region within the selected map.type
, include the map
argument region =
. For example, zip.plot(..., region = ``Maryland'')
would plot only the recipients with ZIP codes in the US state of Maryland.
zip.plot
calls the map
function in the maps
package. The map
function places an object called stateMapEnv
in the user's workspace.
Ryan T. Moore [email protected] and Andrew Reeves [email protected]
## Call murl object of sample addresses data(murljobs) zip.plot(murljobs) ## Read .csv to murl object murljobs <- read.murl(system.file("extdata", "murljobs.csv", package = "muRL")) ## Specify US state to map zip.plot(murljobs, map.type = "state", region = "maryland")
## Call murl object of sample addresses data(murljobs) zip.plot(murljobs) ## Read .csv to murl object murljobs <- read.murl(system.file("extdata", "murljobs.csv", package = "muRL")) ## Specify US state to map zip.plot(murljobs, map.type = "state", region = "maryland")
A .tab file of United States ZIP code data for mapping recipients. Called by zip.plot
to match ZIP codes from letters to latitude and longitude coordinates, and then plot latitudes and longitudes on a user-selected map type.
data(zips)
data(zips)
A data frame with 33309 observations on 4 variables.
state
a factor containing state and territory abbreviations (with levels AK
, AL
, ..., WY
).
zip
a factor containing three-digit, four-digit, five-digit, and three-digit-plus-wildcard formatted ZIP codes (with 33188 levels).
lat
a numeric vector of latitude coordinates.
lon
a numeric vector of longitude coordinates.
A few ZIP codes span more than one state, and thus appear more than once in zips
. See the Examples below for hints on extracting latitude and longitude.
Not all US ZIP codes are currently included in this file. If you have a ZIP code you would like included for plotting, please email the package maintainer with the following four pieces of information: the state in which the ZIP code is located, the ZIP code itself, the latitude of the ZIP code to six decimal places (such as 38.643248), and the longitude of the ZIP code to six decimal places (such as -75.611025). Please also provide the city and any other information required to verify the latitude and longitude for inclusion.
The original file upon which zips.tab
is based is available at http://www.census.gov/
tiger/tms/gazetteer/zcta5.txt
, which is linked from http://www.census.gov/
geo/www/gazetteer/places2k.html
. The US Census Bureau's Geography Division produced these documents. A few additions to the originals have been made. See the muRL
CHANGELOG for details.
Further information about ZIP Code Tabulation Areas (ZCTAs) is available at
http://www.census.gov/geo/ZCTA/zcta.html
.
data(zips) summary(zips$lat) summary(zips$lon) ## Extracting latitude and longitude. ## Create a sample survey data frame with an ID variable, ## respondent ZIP code, state, and survey response: svy1 <- data.frame(id = c(1,2,3,4), zip = c("10001", "10001", "63130", "380HH"), state = c("NY", "NY", "MO", "AR"), resp = c(1,2,1,5)) svy1 ## Since ZIP 380HH spans three states, all are included: svy2 <- merge(svy1, zips, by = "zip", all.x = TRUE) svy2 ## Merging by ZIP and state omits the duplicate 380HH entries: svy3 <- merge(svy1, zips, by = c("zip", "state"), all.x = TRUE) svy3
data(zips) summary(zips$lat) summary(zips$lon) ## Extracting latitude and longitude. ## Create a sample survey data frame with an ID variable, ## respondent ZIP code, state, and survey response: svy1 <- data.frame(id = c(1,2,3,4), zip = c("10001", "10001", "63130", "380HH"), state = c("NY", "NY", "MO", "AR"), resp = c(1,2,1,5)) svy1 ## Since ZIP 380HH spans three states, all are included: svy2 <- merge(svy1, zips, by = "zip", all.x = TRUE) svy2 ## Merging by ZIP and state omits the duplicate 380HH entries: svy3 <- merge(svy1, zips, by = c("zip", "state"), all.x = TRUE) svy3