Discussion during Workshop

Please click on the title to see the session description

Tuesday, September 22, 2020

Is there any difference in R between ' and "?

You can use them interchangeably, but it can be useful if you have a conjunction or some other text that uses ' or ' (e.g., 'can't' or 'as I quote 'blah' '). If you have both ' and ' in your text, then you need to use a regex format, which is a whole other can of beans.

To learn more about regex: https://regexr.com/

Tuesday, September 22, 2020

So are your data all is csv or excel on a desktop? Can you also pull datasets from a server? Does that require using github?

Nope! You can load and work with almost all kinds of data!

Tuesday, September 22, 2020

What is the difference between "=" and "<-"?

the use of '=' as an assignment operator dates back to Fortran and other early generation programming languages. Given that '=' has a predefined meaning in mathematics, I agree it's a bad choice of symbols for an assignment operator.

Further explanation here on the '=' and '<-' discussion: https://stackoverflow.com/questions/1741820/what-are-the-differences-between-and-assignment-operators-in-r

Tuesday, September 22, 2020

Is there anything that can be done with vectorization that can't be done with loops?

One thing is that there are functions that can parallel process operations on elements of the list. There are different more 'for' looking functions that can do can also parallel process things. Another difference is that any assignment in a for loop appears in the global environment if the operations were done in something like lapply they would not.

This is a great overview of the pros and cons of vectorization and looping: https://bookdown.org/rdpeng/rprogdatascience/vectorized-operations.html

Tuesday, September 22, 2020

How does time saved running your vectorized code trade off against extra time and effort needed to figure out how to write it, versus just looping?

I find that scripting a for loop takes about the same amount of time as lapply. If you have operations where subsequent iterations depend on the results of prior iterations, I find it's better to use a for loop.

And FYI, vectorization is also an option in matlab and Gauss, in case anyone uses those languages... https://bookdown.org/rdpeng/rprogdatascience/

Tuesday, September 22, 2020

Is there a NOAA color palette we can load?

https://www.cio.noaa.gov/noaa-style-guide/design-elements-left-nav.html

https://github.com/emilyhmarkowitz/NOAAFisheriesComms

https://github.com/James-Thorson-NOAA/VAST/issues/163

https://github.com/jordanwatson/NOAA_Branding/blob/master/NOAA_branding_508.R

Another package I use sometimes (and Jordan mentions above) for color palettes is RColorBrewer.

You can see some of the options here: https://www.r-graph-gallery.com/38-rcolorbrewers-palettes_files/figure-html/thecode-1.png and https://colorbrewer2.org/

Wednesday, September 23, 2020

Does the online material show how you set up the SQL database on Amazon? Step by step?

[good question]

Wednesday, September 23, 2020

What does API stand for?

'Application programming interface' For more information about getting access to the Census Bureau's API: https://www.census.gov/data/developers.html. For more information youcan 'request a key' from there. https://github.com/sboysel/fredr, for pulling time series from st louis fed.

Wednesday, September 23, 2020

Anyone estimating the economic costs of the MSA data confidentiality requirements?

The costs come in at so many levels. Burden on the analyst is one, but information that can't be used by industry or in management also surely has a huge cost

Thursday, September 24, 2020

To what degree is the purpose of your coding structure to help you organize your own work versus to make it understandable to others?

Both! So much more to talk about on that!

Thursday, September 24, 2020

What level of QAQC/peer review do folks use in their code?

Good comments are essential for this. NWFSC (?) uses pretty rigorous code review, but it's very ad hoc - just checking things that occur to us at the time.

One idea: I've tried setting up companion excel files to record standard checks like min 10% check on some calcs or checks of data frame dimensions when filtering. Having the author set up what needs to be checked is helpful, but we didn't get very far with this. This was a practice I carried over from a private consulting firm, but we haven't gotten people in the habit in our branch (yet - fingers crossed).

One of the videos I put up on the Workshop website has a bit of a deep dive into the here() package.

R packaging (and adopting all the function documentation features) is a central part of my coding and quality. Perhaps a middle ground between adding another required layer of review for QA/QC would be to routinely include code in reproducible (e.g. RMarkdown) form with the deliverables for a project. Wouldn't mind reviewing code if it were as well organized as Emily's :-)

A related issue is when we have academic contractors do work, we need to require and follow through on requirements in contracts for code delivery.

Writing your code with the expectation that someone else will be reviewing it some day tends to make it better and forces you to do things like annotate, etc. A lot of how we handled the additional workload of QA/QC at my last company was by having newer folks who were trying to learn R do the QA. It made authors annotate well, and it allowed new learners to understand better what the code was doing.

Thursday, September 24, 2020

Do you have suggestions specific to organizing .Rmd files for large documents? E.g., by document section, or by tables, then by figures, etc.?

A talk back in March that might give some insight on that if this is helpful. https://www.youtube.com/watch?v=-mycRwaC60A

Thursday, September 24, 2020

Where does a FOIA fall with code?

[good question]

Thursday, September 24, 2020

How does what R can produce compare with LaTeX and Sweave?

The upfront learning cost of LaTeX is much higher, and while you can perhaps do more with it pdfs than markdown. In most cases you can do enough with markdown to get done what needs to be done.

Thursday, September 24, 2020

Is 508 compliance is easier in markdown?

There is a lot that has been built into rmarkdown for 508, but some that is still lacking. I've had to look into some of that for the word-output markdowns was build for FEUS, but it also varies for html, pdf, other output formats. We've made the pdf 508 compliant after the pdf was generated, which is a pain. I tried to look into it once a while back but at the time I didn't find a good way to automate 508 compliance in LaTeX.

Thursday, September 24, 2020

Git/GitHub working with RStudio?

I have a section on getting Git/GitHub working with RStudio in a workshop I taught over the summer. Geared to those who have never used Git or used it and got very frustrated. https://rverse-tutorials.github.io/RWorkflow-NWFSC-2020/

Also has info on working with internal GitLab repos, if you are working on confidential material that must stay on a NOAA server. (eli.holmes@noaa.gov)

Thursday, September 24, 2020

Resources to create R Packages

Link to Hadley Wickham's e-book on package development: https://r-pkgs.org/

Thursday, September 24, 2020

reprexes?

Nice info about reprexes: https://github.com/tidyverse/reprex